Add OpenACC 2.6 `serial' construct support
diff mbox series

Message ID 20191107095213.11618-1-frederik@codesourcery.com
State New
Headers show
Series
  • Add OpenACC 2.6 `serial' construct support
Related show

Commit Message

Harwath, Frederik Nov. 7, 2019, 9:52 a.m. UTC
Hi,
this patch implements the OpenACC 2.6 "serial" construct.
It has been tested by running the testsuite with nvptx-none
offloading on x86_64-pc-linux-gnu.

Best regards,
Frederik
 
-------------------- 8< -------------------

    The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
    is equivalent to a `parallel' construct with clauses `num_gangs(1)
     num_workers(1) vector_length(1)' implied.
    These clauses are therefore not supported with the `serial'
    construct. All the remaining clauses accepted with `parallel' are also
    accepted with `serial'.

    The `serial' construct is implemented like `parallel', except for
    hardcoding dimensions rather than taking them from the relevant
    clauses, in `expand_omp_target'.

    Separate codes are used to denote the `serial' construct throughout the
    middle end, even though the mapping of `serial' to an equivalent
    `parallel' construct could have been done in the individual language
    frontends. In particular, this allows to distinguish between `parallel'
    and `serial' in warnings, error messages, dumps etc.

    2019-11-07  Maciej W. Rozycki  <macro@codesourcery.com>
		Tobias Burnus  <tobias@codesourcery.com>
                Frederik Harwath  <frederik@codesourcery.com>

	gcc/
	* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
	enumeration constant.
	(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
	(is_gimple_omp_offloaded): Likewise.
	* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
	constant.  Adjust the value of ORT_NONE accordingly.
	(is_gimple_stmt): Handle OACC_SERIAL.
	(oacc_default_clause): Handle ORT_ACC_SERIAL.
	(gomp_needs_data_present): Likewise.
	(gimplify_adjust_omp_clauses): Likewise.
	(gimplify_omp_workshare): Handle OACC_SERIAL.
	(gimplify_expr): Likewise.
	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Add parameter.
	* omp-expand.c (expand_omp_target):
	Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
	(build_omp_regions_1, omp_make_gimple_edges): Likewise.
	* omp-low.c (is_oacc_parallel): Rename function to...
	(is_oacc_parallel_or_serial): ... this.
	Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
	(scan_sharing_clauses): Adjust accordingly.
	(scan_omp_for): Likewise.
	(lower_oacc_head_mark): Likewise.
	(convert_from_firstprivate_int): Likewise.
	(lower_omp_target): Likewise.
	(check_omp_nesting_restrictions): Handle
	GF_OMP_TARGET_KIND_OACC_SERIAL.
	(lower_oacc_reductions): Likewise.
	(lower_omp_target): Likewise.
	* tree.def (OACC_SERIAL): New tree code.
	* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.

	* doc/generic.texi (OpenACC): Document OACC_SERIAL.

	gcc/c-family/
	* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
	constant.
	* c-pragma.c (oacc_pragmas): Add "serial" entry.

	gcc/c/
	* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
	(c_parser_oacc_kernels_parallel): Rename function to...
	(c_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
	(c_parser_omp_construct): Update accordingly.

	gcc/cp/
	* constexpr.c (potential_constant_expression_1): Handle
	OACC_SERIAL.
	* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
	(cp_parser_oacc_kernels_parallel): Rename function to...
	(cp_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
	(cp_parser_omp_construct): Update accordingly.
	(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL.  Fix alphabetic
	order.
	* pt.c (tsubst_expr): Handle OACC_SERIAL.

	gcc/fortran/
	* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
	ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
	enumeration constants.
	(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
	enumeration constants.
	* match.h (gfc_match_oacc_serial): New prototype.
	(gfc_match_oacc_serial_loop): Likewise.
	* dump-parse-tree.c (show_omp_node, show_code_node): Handle
	EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
	* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
	* openmp.c (OACC_SERIAL_CLAUSES): New macro.
	(gfc_match_oacc_serial_loop): New function.
	(gfc_match_oacc_serial): Likewise.
	(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
	(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
	(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
	EXEC_OACC_SERIAL_LOOP.
	(gfc_resolve_oacc_directive): Likewise.
	* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
	and "serial loop".
	(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
	(gfc_ascii_statement): Likewise.  Handle ST_OACC_END_SERIAL_LOOP
	and ST_OACC_END_SERIAL.
	(parse_oacc_structured_block): Handle ST_OACC_SERIAL.
	(parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and
	ST_OACC_END_SERIAL_LOOP.
	(parse_executable): Handle ST_OACC_SERIAL_LOOP and
	ST_OACC_SERIAL.
	(is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
	* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Likewise.
	* st.c (gfc_free_statement): Likewise.
	* trans-openmp.c (gfc_trans_oacc_construct): Handle
	EXEC_OACC_SERIAL.
	(gfc_trans_oacc_combined_directive): Handle
	EXEC_OACC_SERIAL_LOOP.
	(gfc_trans_oacc_directive): Handle EXEC_OACC_SERIAL_LOOP and
	EXEC_OACC_SERIAL.
	* trans.c (trans_code): Likewise.

	gcc/testsuite/
	* c-c++-common/goacc/serial-dims.c: New test.
	* gfortran.dg/goacc/serial-dims.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/serial-dims.c: New test.
	* testsuite/libgomp.oacc-fortran/serial-dims-aux.c: New test.
	* testsuite/libgomp.oacc-fortran/serial-dims.f90: New test.
---
 gcc/c-family/c-pragma.c                       |  1 +
 gcc/c-family/c-pragma.h                       |  1 +
 gcc/c/c-parser.c                              | 34 ++++++-
 gcc/cp/constexpr.c                            |  1 +
 gcc/cp/parser.c                               | 35 ++++++-
 gcc/cp/pt.c                                   |  1 +
 gcc/doc/generic.texi                          |  5 +
 gcc/fortran/dump-parse-tree.c                 |  6 ++
 gcc/fortran/gfortran.h                        | 13 +--
 gcc/fortran/match.c                           |  3 +-
 gcc/fortran/match.h                           |  2 +
 gcc/fortran/openmp.c                          | 35 ++++++-
 gcc/fortran/parse.c                           | 30 +++++-
 gcc/fortran/resolve.c                         |  6 ++
 gcc/fortran/st.c                              |  2 +
 gcc/fortran/trans-openmp.c                    | 13 ++-
 gcc/fortran/trans.c                           |  2 +
 gcc/gimple-pretty-print.c                     |  3 +
 gcc/gimple.h                                  |  3 +
 gcc/gimplify.c                                | 20 +++-
 gcc/omp-expand.c                              | 47 ++++++++--
 gcc/omp-low.c                                 | 33 ++++---
 .../c-c++-common/goacc/serial-dims.c          | 12 +++
 .../gfortran.dg/goacc/serial-dims.f90         | 40 ++++++++
 gcc/tree-pretty-print.c                       |  4 +
 gcc/tree.def                                  |  6 ++
 .../libgomp.oacc-c-c++-common/serial-dims.c   | 92 +++++++++++++++++++
 .../libgomp.oacc-fortran/serial-dims-aux.c    | 41 +++++++++
 .../libgomp.oacc-fortran/serial-dims.f90      | 89 ++++++++++++++++++
 29 files changed, 535 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/serial-dims.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/serial-dims.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90

Comments

Thomas Schwinge Nov. 11, 2019, 11:54 a.m. UTC | #1
Hi Frederik!

On 2019-11-07T10:52:13+0100, Frederik Harwath <frederik@codesourcery.com> wrote:
> this patch implements the OpenACC 2.6 "serial" construct.

Thanks for taking on that one.

> It has been tested by running the testsuite with nvptx-none
> offloading on x86_64-pc-linux-gnu.

This is OK for trunk with the attached "incremental, into Add OpenACC 2.6
`serial' construct support" merged in.  (No need to re-test; I've just
done that.)  In the incremental patch, I'm streamlining some code,
format/handle 'serial' the same as existing 'parallel', etc., plus a few
more things, see my comments in the patch review below.

To record the review effort, please include "Reviewed-by: Thomas Schwinge
<thomas@codesourcery.com>" in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.


I'm working on an additional patch to handle 'serial' in more cases where
it's wrong to diverge from 'parallel' (this tells us: a lot of testsuite
coverage is missing...), etc.

Thus I'm adding a lot of testsuite coverage.  I'm not asking you to work
on that, as that's not a feasible task for someone who's still new to all
this, to figure out the appropriate tests that should be
augmented/duplicated for 'serial'.  And, coming up with a list for you to
work though, I suppose would be more time consuming for me instead of
just doing it myself.  ;-)

However, you're of course always encouraged to learn from reading such
patches, and ask questions for any things unclear, of course.


>     The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
>     is equivalent to a `parallel' construct with clauses `num_gangs(1)
>      num_workers(1) vector_length(1)' implied.

..., and that's how it -- basically -- is implemented, and thus every
usage of 'serial' gets an annoying 'warning: using vector_length (32),
ignoring 1' for nvptx offloading compilation.  I wonder if we should
sinply disable that nvptx back end warning when an 'oacc serial'
attribute is present?  Or, if we should not, to highlight the issue that
I recently filed <https://github.com/OpenACC/openacc-spec/issues/238>
"OpenACC 'serial' construct might not actually be serial", discovered
during this review process?  (Summary: by GCC have a default of
'vector_length (32)', we do get vector parallelism with 'loop vector', or
'routine vector' inside 'serial' regions -- not clear if that's
intentional, and/or correct.)

>     These clauses are therefore not supported with the `serial'
>     construct. All the remaining clauses accepted with `parallel' are also
>     accepted with `serial'.
>
>     The `serial' construct is implemented like `parallel', except for
>     hardcoding dimensions rather than taking them from the relevant
>     clauses, in `expand_omp_target'.

>     Separate codes are used to denote the `serial' construct throughout the
>     middle end, even though the mapping of `serial' to an equivalent
>     `parallel' construct could have been done in the individual language
>     frontends.

Yeah, I'd pointed this out early on, and I still wonder if early
translating 'serial' into 'parallel num_gangs (1) num_workers (1)
vector_length (1)' (if that's really just what it is) would be better?
Would save quite some effort (duplicate all 'parallel' handling for
'serial').  On the other hand, we'd then need a different mechanism for:

>     In particular, this allows to distinguish between `parallel'
>     and `serial' in warnings, error messages, dumps etc.

... that (or just say "compute construct" instead of 'parallel',
'kernels', 'serial').  But we'll eventually want such a more general
mechnisma anyway; <https://gcc.gnu.org/PR65095> "Adapt OpenMP diagnostic
messages for OpenACC".

So -- we've now got that implementation, and we can still clean it up
later on.


> 	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Add parameter.

Not anymore.


>  create mode 100644 gcc/testsuite/gfortran.dg/goacc/serial-dims.f90

>  create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
>  create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90

Thanks to you (and/or Tobias, I suppose) for adding some Fortran
testsuite coversage, because:

> --- a/gcc/fortran/parse.c
> +++ b/gcc/fortran/parse.c
> @@ -683,6 +683,9 @@ decode_oacc_directive (void)
>        matcha ("end parallel loop", gfc_match_omp_eos_error,
>  	      ST_OACC_END_PARALLEL_LOOP);
>        matcha ("end parallel", gfc_match_omp_eos_error, ST_OACC_END_PARALLEL);
> +      matcha ("end serial loop", gfc_match_omp_eos_error,
> +	      ST_OACC_END_SERIAL_LOOP);
> +      matcha ("end serial", gfc_match_omp_eos_error, ST_OACC_END_SERIAL);
>        matcha ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DATA);
>        matcha ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA);
>        break;

Wow, wow.  I see this has not been present in the og8 and og9 commits of
the OpenACC 'serial' changes.  This tells us: the OpenACC 'serial'
construct has *not at all* been tested with Fortran; any compilation
attempt would've stopped early in the front end:

       25 |   !$acc end serial loop
          |         1
    Error: Unclassifiable OpenACC directive at (1)

       28 |   !$acc end serial
          |         1
    Error: Unclassifiable OpenACC directive at (1)

Thanks for fixing that.


> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h

> @@ -182,6 +182,7 @@ enum gf_mask {
>      GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
>      GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
>      GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
> +    GF_OMP_TARGET_KIND_OACC_SERIAL = 12,

That's not wrong, but I've still moved 'GF_OMP_TARGET_KIND_OACC_SERIAL'
next to/after the existing 'GF_OMP_TARGET_KIND_OACC_PARALLEL',
'GF_OMP_TARGET_KIND_OACC_KERNELS' (it's OK to renumber 'enum gf_mask'
items), so that there's (at least some) consistency in the the
'parallel', 'kernels', 'serial' ordering (which is the order they appear
in the current specification), that we shall use unless alphabetical
ordering is used.


> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -7518,7 +7526,7 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
>  
>    /* In a parallel region, loops are implicitly INDEPENDENT.  */
>    omp_context *tgt = enclosing_target_ctx (ctx);
> -  if (!tgt || is_oacc_parallel (tgt))
> +  if (!tgt || is_oacc_parallel_or_serial (tgt))
>      tag |= OLF_INDEPENDENT;

I would agree, but from a (very) quick look, I don't think the OpenACC
specification actually says anything on that topic.  Something I'll get
that clarified.


> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/serial-dims.c
> @@ -0,0 +1,12 @@
> +/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
> +   num_workers, vector_length with the serial construct.  */
> +
> +void f(void)
> +{
> +#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid for '#pragma acc serial'" } */
> +  ;
> +#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not valid for '#pragma acc serial'" } */
> +  ;
> +#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is not valid for '#pragma acc serial'" } */
> +  ;
> +}

I've merged that into the existing 'c-c++-common/goacc/parallel-dims-2.c'.

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90
> @@ -0,0 +1,40 @@
> +! Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
> +! num_workers, vector_length with the serial construct.
> +
> +subroutine s()
> +  integer :: i
> +  !$acc parallel
> +  !$acc end parallel
> +
> +  !$acc parallel loop
> +  do i = 1, 5
> +  end do
> +
> +  !$acc parallel loop
> +  do i = 1, 5
> +  end do
> +  !$acc end parallel loop
> +
> +  !$acc serial loop
> +  do i = 1, 5
> +  end do
> +
> +  !$acc serial loop
> +  do i = 1, 5
> +  end do
> +  !$acc end serial loop
> +
> +  !$acc serial
> +  !$acc end serial
> +end subroutine s
> +
> +subroutine f()
> +!$acc serial num_gangs (1)  ! { dg-error "Failed to match clause at" }
> +!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
> +
> +!$acc serial num_workers (1)  ! { dg-error "Failed to match clause at" }
> +!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
> +
> +!$acc serial vector_length (1)  ! { dg-error "Failed to match clause at" }
> +!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
> +end subroutine f

Similarly, for symmetry, moved into (new)
'gfortran.dg/goacc/parallel-dims-2.f90'.


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
> @@ -0,0 +1,92 @@
> +/* OpenACC dimensions with the serial construct.  */

This I've merged into the existing
'libgomp.oacc-c-c++-common/parallel-dims.c', instead of duplicating
infrastructure here, and doing some things slightly differently (possibly
due to incorrect divergence between 'serial' and 'parallel' handling, as
I mentioned above, which I shall soon fix).

> +  /* Serial OpenACC constructs must get launched as 1 x 1 x 1.  */
> +  {
> +    int gangs_min, gangs_max;
> +    int workers_min, workers_max;
> +    int vectors_min, vectors_max;
> +    int gangs_actual, workers_actual, vectors_actual;
> +    int i, j, k;
> +
> +    gangs_min = workers_min = vectors_min = INT_MAX;
> +    gangs_max = workers_max = vectors_max = INT_MIN;
> +    gangs_actual = workers_actual = vectors_actual = 1;
> +#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */
> +    {
> +      if (acc_on_device (acc_device_nvidia))
> +	{
> +	  /* The GCC nvptx back end enforces vector_length (32).  */
> +	  vectors_actual = 32;

So, that's actually a good question, whether that is permissible --
that's <https://github.com/OpenACC/openacc-spec/issues/238> "OpenACC
'serial' construct might not actually be serial", as mentioned above.

> +	}
> +      else if (!acc_on_device (acc_device_host))
> +	__builtin_abort ();
> +#pragma acc loop gang \
> +  reduction (min: gangs_min, workers_min, vectors_min) \
> +  reduction (max: gangs_max, workers_max, vectors_max)
> +      for (i = 100 * gangs_actual; i > -100 * gangs_actual; i--)
> +#pragma acc loop worker \
> +  reduction (min: gangs_min, workers_min, vectors_min) \
> +  reduction (max: gangs_max, workers_max, vectors_max)
> +	for (j = 100 * workers_actual; j > -100 * workers_actual; j--)
> +#pragma acc loop vector \
> +  reduction (min: gangs_min, workers_min, vectors_min) \
> +  reduction (max: gangs_max, workers_max, vectors_max)
> +	  for (k = 100 * vectors_actual; k > -100 * vectors_actual; k--)
> +	    {
> +	      gangs_min = gangs_max = acc_gang ();
> +	      workers_min = workers_max = acc_worker ();
> +	      vectors_min = vectors_max = acc_vector ();
> +	    }
> +      if (gangs_min != 0 || gangs_max != gangs_actual - 1
> +	  || workers_min != 0 || workers_max != workers_actual - 1
> +	  || vectors_min != 0 || vectors_max != vectors_actual - 1)
> +	__builtin_abort ();
> +    }
> +  }

Per the OpenACC 'loop' directives specified here, that's testing
gang-partitioned, worker-partitioned, vector-partitioned execution mode.

We should also test gang-redundant, worker-single, vector-single
execution mode, which I've added.


The Fortran counter part (thanks for creating that!), I suppose, had not
yet been tested?

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
> @@ -0,0 +1,41 @@
> +/* OpenACC dimensions with the serial construct.  */
> +/* Used by serial-dims.f90.  */

I indicate where this has been copied from.

(Generally, getting rid of these wrapper functions is for another day.)

> +#include <limits.h>
> +#include <openacc.h>
> +#include <gomp-constants.h>
> +
> +/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
> +   not behaving as expected for -O0.  */
> +#pragma acc routine seq
> +static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
> +{
> +  if (acc_on_device ((int) acc_device_host))
> +    return 0;
> +  else if (acc_on_device ((int) acc_device_nvidia))
> +    return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
> +  else
> +    __builtin_abort ();
> +}
> +
> +#pragma acc routine seq
> +static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
> +{
> +  if (acc_on_device ((int) acc_device_host))
> +    return 0;
> +  else if (acc_on_device ((int) acc_device_nvidia))
> +    return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
> +  else
> +    __builtin_abort ();
> +}
> +
> +#pragma acc routine seq
> +static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
> +{
> +  if (acc_on_device ((int) acc_device_host))
> +    return 0;
> +  else if (acc_on_device ((int) acc_device_nvidia))
> +    return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
> +  else
> +    __builtin_abort ();
> +}

Compilation of 'libgomp.oacc-fortran/serial-dims.f90' fails:

    serial-dims.f90:(.text+0x124): undefined reference to `acc_gang'
    serial-dims.f90:(.text+0x130): undefined reference to `acc_gang'
    serial-dims.f90:(.text+0x13c): undefined reference to `acc_worker'
    serial-dims.f90:(.text+0x148): undefined reference to `acc_worker'
    serial-dims.f90:(.text+0x154): undefined reference to `acc_vector'
    serial-dims.f90:(.text+0x160): undefined reference to `acc_vector'

Have to remove 'static' from 'acc_gang', 'acc_worker', 'acc_vector'.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90
> @@ -0,0 +1,89 @@
> +! OpenACC dimensions with the serial construct.

This needs '{ dg-do run }' for torture testing.

> +
> +! { dg-additional-sources serial-dims-aux.c }
> +! { dg-warning "command line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }

We get:

    FAIL: libgomp.oacc-fortran/serial-dims.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O   (test for warnings, line 4)
    FAIL: libgomp.oacc-fortran/serial-dims.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O  (test for excess errors)

..., with:

    Excess errors:
    cc1: warning: command-line option '-fintrinsic-modules-path=[...]' is valid for Fortran but not for C

That's because that diagnostic doesn't appear on the line where the
'dg-warning' directive is present (line 4).  I changed that to
'dg-prune-output', but I wonder if there's a better way, so that we can
specify to expect/match a diagnostic without line number information -- I
can't remember whether such a thing exists.

However, that still fails: "command[-]line option" typo.  ;-)

> +module acc_routines
> +  implicit none (type, external)
> +
> +  interface
> +    integer function acc_gang() bind(C)
> +      !$acc routine seq
> +    end function acc_gang
> +
> +    integer function acc_worker() bind(C)
> +      !$acc routine seq
> +    end function acc_worker
> +
> +    integer function acc_vector() bind(C)
> +      !$acc routine seq
> +    end function acc_vector
> +  end interface
> +end module acc_routines

With '-Wall', we're told:

       14 |     integer function acc_gang() bind(C)
          |                             1
    Warning: Variable 'acc_gang' at (1) may not be a C interoperable kind but it is BIND(C) [-Wc-binding-type]

       22 |     integer function acc_vector() bind(C)
          |                               1
    Warning: Variable 'acc_vector' at (1) may not be a C interoperable kind but it is BIND(C) [-Wc-binding-type]

       18 |     integer function acc_worker() bind(C)
          |                               1
    Warning: Variable 'acc_worker' at (1) may not be a C interoperable kind but it is BIND(C) [-Wc-binding-type]

I have not yet looked into that.

> +program main
> +  use iso_c_binding
> +  use openacc
> +  use acc_routines
> +  implicit none (type, external)
> +
> +  integer :: gangs_min, gangs_max
> +  integer :: workers_min, workers_max
> +  integer :: vectors_min, vectors_max
> +  integer :: gangs_actual, workers_actual, vectors_actual
> +  integer :: i, j, k
> +
> +  call acc_init (acc_device_default)
> +
> +  ! Serial OpenACC constructs must get launched as 1 x 1 x 1.
> +  gangs_min = huge(gangs_min)
> +  workers_min = huge(workers_min)
> +  vectors_min = huge(vectors_min)
> +  gangs_max = -huge(gangs_max) - 1  ! INT_MIN
> +  workers_max = -huge(gangs_max) - 1
> +  vectors_max = -huge(gangs_max) - 1

Indeed the C/C++ initializes '*_min' variables with 'INT_MAX', and
'*_max' variables with 'INT_MIN'.  Is the above the generic Fortran
counter part for that?

> +  gangs_actual = 1
> +  workers_actual = 1
> +  vectors_actual = 1
> +
> +  !$acc serial ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } }
> +    if (acc_on_device (acc_device_nvidia)) then
> +      ! The GCC nvptx back end enforces vector_length (32).
> +      vectors_actual = 32
> +    elseif (acc_on_device (acc_device_gcn)) then
> +      ! AMD GCN relies on the autovectorizer for the vector dimension:
> +      ! the loop below isn't likely to be vectorized, so vectors_actual
> +      ! is effectively 1.
> +      vectors_actual = 1

We're told:

    [...]/libgomp.oacc-fortran/serial-dims.f90:53:41: Error: Symbol 'acc_device_gcn' at (1) has no IMPLICIT type; did you mean 'acc_device_kind'?

AMD GCN offloading support doesn't exist on trunk yet, so removed that
here.

> +    elseif (.not. acc_on_device (acc_device_host)) then
> +      stop 1
> +    end if
> +
> +!$acc loop gang &
> +!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
> +!$acc & reduction (max: gangs_max, workers_max, vectors_max)
> +    do i = 100 * gangs_actual, -99 * gangs_actual, -1
> +!$acc loop worker &
> +!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
> +!$acc & reduction (max: gangs_max, workers_max, vectors_max)
> +      do j = 100 * workers_actual, -99 * workers_actual, -1
> +!$acc loop vector &
> +!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
> +!$acc & reduction (max: gangs_max, workers_max, vectors_max)
> +        do k = 100 * vectors_actual, -99 * vectors_actual, -1
> +          gangs_min = acc_gang ();
> +          gangs_max = acc_gang ();
> +          workers_min = acc_worker ();
> +          workers_max = acc_worker ();
> +          vectors_min = acc_vector ();
> +          vectors_max = acc_vector ();
> +       end do
> +     end do
> +   end do
> +  if (gangs_min /= 0 .or. gangs_max /= gangs_actual - 1 &
> +      .or. workers_min /= 0 .or. workers_max /= workers_actual - 1 &
> +      .or. vectors_min /= 0 .or. vectors_max /= vectors_actual - 1) &
> +    stop 2
> +!$acc end serial
> +
> +end program main


Grüße
 Thomas

Patch
diff mbox series

diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 9fee84b22383..158154ec1294 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1291,6 +1291,7 @@  static const struct omp_pragma_def oacc_pragmas[] = {
   { "loop", PRAGMA_OACC_LOOP },
   { "parallel", PRAGMA_OACC_PARALLEL },
   { "routine", PRAGMA_OACC_ROUTINE },
+  { "serial", PRAGMA_OACC_SERIAL },
   { "update", PRAGMA_OACC_UPDATE },
   { "wait", PRAGMA_OACC_WAIT }
 };
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index e0aa774555a3..bfe681bb430a 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -38,6 +38,7 @@  enum pragma_kind {
   PRAGMA_OACC_LOOP,
   PRAGMA_OACC_PARALLEL,
   PRAGMA_OACC_ROUTINE,
+  PRAGMA_OACC_SERIAL,
   PRAGMA_OACC_UPDATE,
   PRAGMA_OACC_WAIT,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 4f044127a7e2..f5d217d0b7a4 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -16060,6 +16060,11 @@  c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
    # pragma acc parallel oacc-parallel-clause[optseq] new-line
      structured-block
 
+   OpenACC 2.6:
+
+   # pragma acc serial oacc-serial-clause[optseq] new-line
+     structured-block
+
    LOC is the location of the #pragma token.
 */
 
@@ -16096,10 +16101,24 @@  c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR_LENGTH)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
 
+#define OACC_SERIAL_CLAUSE_MASK					\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRIVATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_FIRSTPRIVATE)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_REDUCTION)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
 static tree
-c_parser_oacc_kernels_parallel (location_t loc, c_parser *parser,
-				enum pragma_kind p_kind, char *p_name,
-				bool *if_p)
+c_parser_oacc_compute (location_t loc, c_parser *parser,
+		       enum pragma_kind p_kind, char *p_name, bool *if_p)
 {
   omp_clause_mask mask;
   enum tree_code code;
@@ -16115,6 +16134,11 @@  c_parser_oacc_kernels_parallel (location_t loc, c_parser *parser,
       mask = OACC_PARALLEL_CLAUSE_MASK;
       code = OACC_PARALLEL;
       break;
+    case PRAGMA_OACC_SERIAL:
+      strcat (p_name, " serial");
+      mask = OACC_SERIAL_CLAUSE_MASK;
+      code = OACC_SERIAL;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -20578,9 +20602,9 @@  c_parser_omp_construct (c_parser *parser, bool *if_p)
       break;
     case PRAGMA_OACC_KERNELS:
     case PRAGMA_OACC_PARALLEL:
+    case PRAGMA_OACC_SERIAL:
       strcpy (p_name, "#pragma acc");
-      stmt = c_parser_oacc_kernels_parallel (loc, parser, p_kind, p_name,
-					     if_p);
+      stmt = c_parser_oacc_compute (loc, parser, p_kind, p_name, if_p);
       break;
     case PRAGMA_OACC_LOOP:
       strcpy (p_name, "#pragma acc");
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 20fddc57825a..8c79b0484fce 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6986,6 +6986,7 @@  potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now,
     case OMP_DEPOBJ:
     case OACC_PARALLEL:
     case OACC_KERNELS:
+    case OACC_SERIAL:
     case OACC_DATA:
     case OACC_HOST_DATA:
     case OACC_LOOP:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 7138aebebced..c45bfccf6e61 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -40150,6 +40150,10 @@  cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
 
    # pragma acc parallel oacc-parallel-clause[optseq] new-line
      structured-block
+
+   OpenACC 2.6:
+
+   # pragma acc serial oacc-serial-clause[optseq] new-line
 */
 
 #define OACC_KERNELS_CLAUSE_MASK					\
@@ -40185,9 +40189,24 @@  cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR_LENGTH)       \
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
 
+#define OACC_SERIAL_CLAUSE_MASK						\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRIVATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_FIRSTPRIVATE)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_REDUCTION)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
 static tree
-cp_parser_oacc_kernels_parallel (cp_parser *parser, cp_token *pragma_tok,
-				 char *p_name, bool *if_p)
+cp_parser_oacc_compute (cp_parser *parser, cp_token *pragma_tok,
+			char *p_name, bool *if_p)
 {
   omp_clause_mask mask;
   enum tree_code code;
@@ -40203,6 +40222,11 @@  cp_parser_oacc_kernels_parallel (cp_parser *parser, cp_token *pragma_tok,
       mask = OACC_PARALLEL_CLAUSE_MASK;
       code = OACC_PARALLEL;
       break;
+    case PRAGMA_OACC_SERIAL:
+      strcat (p_name, " serial");
+      mask = OACC_SERIAL_CLAUSE_MASK;
+      code = OACC_SERIAL;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -42022,9 +42046,9 @@  cp_parser_omp_construct (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
       break;
     case PRAGMA_OACC_KERNELS:
     case PRAGMA_OACC_PARALLEL:
+    case PRAGMA_OACC_SERIAL:
       strcpy (p_name, "#pragma acc");
-      stmt = cp_parser_oacc_kernels_parallel (parser, pragma_tok, p_name,
-					      if_p);
+      stmt = cp_parser_oacc_compute (parser, pragma_tok, p_name, if_p);
       break;
     case PRAGMA_OACC_LOOP:
       strcpy (p_name, "#pragma acc");
@@ -42691,8 +42715,9 @@  cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
     case PRAGMA_OACC_DATA:
     case PRAGMA_OACC_HOST_DATA:
     case PRAGMA_OACC_KERNELS:
-    case PRAGMA_OACC_PARALLEL:
     case PRAGMA_OACC_LOOP:
+    case PRAGMA_OACC_PARALLEL:
+    case PRAGMA_OACC_SERIAL:
     case PRAGMA_OMP_ATOMIC:
     case PRAGMA_OMP_CRITICAL:
     case PRAGMA_OMP_DISTRIBUTE:
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 8bacb3952ff2..5a0efaa86c8b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17991,6 +17991,7 @@  tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 
     case OACC_KERNELS:
     case OACC_PARALLEL:
+    case OACC_SERIAL:
       tmp = tsubst_omp_clauses (OMP_CLAUSES (t), C_ORT_ACC, args, complain,
 				in_decl);
       stmt = begin_omp_parallel ();
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 94e339c15ee8..badaaec38979 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -2388,6 +2388,7 @@  compilation.
 @tindex OACC_KERNELS
 @tindex OACC_LOOP
 @tindex OACC_PARALLEL
+@tindex OACC_SERIAL
 @tindex OACC_UPDATE
 
 All the statements starting with @code{OACC_} represent directives and
@@ -2432,6 +2433,10 @@  See the description of the @code{OMP_FOR} code.
 
 Represents @code{#pragma acc parallel [clause1 @dots{} clauseN]}.
 
+@item OACC_SERIAL
+
+Represents @code{#pragma acc serial [clause1 @dots{} clauseN]}.
+
 @item OACC_UPDATE
 
 Represents @code{#pragma acc update [clause1 @dots{} clauseN]}.
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 9d7aad19e2f5..253fe15b201d 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1654,6 +1654,8 @@  show_omp_node (int level, gfc_code *c)
     case EXEC_OACC_PARALLEL: name = "PARALLEL"; is_oacc = true; break;
     case EXEC_OACC_KERNELS_LOOP: name = "KERNELS LOOP"; is_oacc = true; break;
     case EXEC_OACC_KERNELS: name = "KERNELS"; is_oacc = true; break;
+    case EXEC_OACC_SERIAL_LOOP: name = "SERIAL LOOP"; is_oacc = true; break;
+    case EXEC_OACC_SERIAL: name = "SERIAL"; is_oacc = true; break;
     case EXEC_OACC_DATA: name = "DATA"; is_oacc = true; break;
     case EXEC_OACC_HOST_DATA: name = "HOST_DATA"; is_oacc = true; break;
     case EXEC_OACC_LOOP: name = "LOOP"; is_oacc = true; break;
@@ -1729,6 +1731,8 @@  show_omp_node (int level, gfc_code *c)
     case EXEC_OACC_PARALLEL:
     case EXEC_OACC_KERNELS_LOOP:
     case EXEC_OACC_KERNELS:
+    case EXEC_OACC_SERIAL_LOOP:
+    case EXEC_OACC_SERIAL:
     case EXEC_OACC_DATA:
     case EXEC_OACC_HOST_DATA:
     case EXEC_OACC_LOOP:
@@ -2918,6 +2922,8 @@  show_code_node (int level, gfc_code *c)
     case EXEC_OACC_PARALLEL:
     case EXEC_OACC_KERNELS_LOOP:
     case EXEC_OACC_KERNELS:
+    case EXEC_OACC_SERIAL_LOOP:
+    case EXEC_OACC_SERIAL:
     case EXEC_OACC_DATA:
     case EXEC_OACC_HOST_DATA:
     case EXEC_OACC_LOOP:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 920acdafc6b7..e962db59bc59 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -223,7 +223,8 @@  enum gfc_statement
   ST_OACC_END_DATA, ST_OACC_HOST_DATA, ST_OACC_END_HOST_DATA, ST_OACC_LOOP,
   ST_OACC_END_LOOP, ST_OACC_DECLARE, ST_OACC_UPDATE, ST_OACC_WAIT,
   ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP,
-  ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
+  ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL,
+  ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
   ST_OACC_ATOMIC, ST_OACC_END_ATOMIC,
   ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC,
   ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED,
@@ -2572,11 +2573,11 @@  enum gfc_exec_op
   EXEC_BACKSPACE, EXEC_ENDFILE, EXEC_INQUIRE, EXEC_REWIND, EXEC_FLUSH,
   EXEC_FORM_TEAM, EXEC_CHANGE_TEAM, EXEC_END_TEAM, EXEC_SYNC_TEAM,
   EXEC_LOCK, EXEC_UNLOCK, EXEC_EVENT_POST, EXEC_EVENT_WAIT, EXEC_FAIL_IMAGE,
-  EXEC_OACC_KERNELS_LOOP, EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_ROUTINE,
-  EXEC_OACC_PARALLEL, EXEC_OACC_KERNELS, EXEC_OACC_DATA, EXEC_OACC_HOST_DATA,
-  EXEC_OACC_LOOP, EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE,
-  EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA, EXEC_OACC_ATOMIC,
-  EXEC_OACC_DECLARE,
+  EXEC_OACC_KERNELS_LOOP, EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_SERIAL_LOOP,
+  EXEC_OACC_ROUTINE, EXEC_OACC_PARALLEL, EXEC_OACC_KERNELS, EXEC_OACC_SERIAL,
+  EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE,
+  EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA,
+  EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE,
   EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER,
   EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO,
   EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE,
diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 4a31080a2856..b5945049de55 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -2860,7 +2860,8 @@  match_exit_cycle (gfc_statement st, gfc_exec_op op)
       && o != NULL
       && o->state == COMP_OMP_STRUCTURED_BLOCK
       && (o->head->op == EXEC_OACC_LOOP
-	  || o->head->op == EXEC_OACC_PARALLEL_LOOP))
+	  || o->head->op == EXEC_OACC_PARALLEL_LOOP
+	  || o->head->op == EXEC_OACC_SERIAL_LOOP))
     {
       int collapse = 1;
       gcc_assert (o->head->next != NULL
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 611d79646458..954af72f0e07 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -147,6 +147,8 @@  match gfc_match_oacc_kernels_loop (void);
 match gfc_match_oacc_parallel (void);
 match gfc_match_oacc_parallel_loop (void);
 match gfc_match_oacc_enter_data (void);
+match gfc_match_oacc_serial (void);
+match gfc_match_oacc_serial_loop (void);
 match gfc_match_oacc_exit_data (void);
 match gfc_match_oacc_routine (void);
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ca3427885457..198facce636d 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1964,6 +1964,15 @@  gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
    | OMP_CLAUSE_COPY | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT		      \
    | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_DEFAULT	      \
    | OMP_CLAUSE_WAIT)
+#define OACC_SERIAL_CLAUSES \
+  (omp_mask (OMP_CLAUSE_ASYNC) | OMP_CLAUSE_WAIT			      \
+   | OMP_CLAUSE_IF							      \
+   | OMP_CLAUSE_REDUCTION						      \
+   | OMP_CLAUSE_COPY | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT		      \
+   | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT				      \
+   | OMP_CLAUSE_DEVICEPTR						      \
+   | OMP_CLAUSE_PRIVATE | OMP_CLAUSE_FIRSTPRIVATE			      \
+   | OMP_CLAUSE_DEFAULT)
 #define OACC_DATA_CLAUSES \
   (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_DEVICEPTR  | OMP_CLAUSE_COPY	      \
    | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT | OMP_CLAUSE_CREATE		      \
@@ -2038,6 +2047,21 @@  gfc_match_oacc_kernels (void)
 }
 
 
+match
+gfc_match_oacc_serial_loop (void)
+{
+  return match_acc (EXEC_OACC_SERIAL_LOOP,
+		    OACC_SERIAL_CLAUSES | OACC_LOOP_CLAUSES);
+}
+
+
+match
+gfc_match_oacc_serial (void)
+{
+  return match_acc (EXEC_OACC_SERIAL, OACC_SERIAL_CLAUSES);
+}
+
+
 match
 gfc_match_oacc_data (void)
 {
@@ -3783,6 +3807,7 @@  oacc_is_loop (gfc_code *code)
 {
   return code->op == EXEC_OACC_PARALLEL_LOOP
 	 || code->op == EXEC_OACC_KERNELS_LOOP
+	 || code->op == EXEC_OACC_SERIAL_LOOP
 	 || code->op == EXEC_OACC_LOOP;
 }
 
@@ -4626,7 +4651,9 @@  resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 				 n->sym->name, name, &n->where);
 		  }
 		if (code
-		    && (oacc_is_loop (code) || code->op == EXEC_OACC_PARALLEL))
+		    && (oacc_is_loop (code)
+			|| code->op == EXEC_OACC_PARALLEL
+			|| code->op == EXEC_OACC_SERIAL))
 		  check_array_not_assumed (n->sym, n->where, name);
 		else if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
 		  gfc_error ("Assumed size array %qs in %s clause at %L",
@@ -5818,6 +5845,8 @@  oacc_code_to_statement (gfc_code *code)
       return ST_OACC_PARALLEL;
     case EXEC_OACC_KERNELS:
       return ST_OACC_KERNELS;
+    case EXEC_OACC_SERIAL:
+      return ST_OACC_SERIAL;
     case EXEC_OACC_DATA:
       return ST_OACC_DATA;
     case EXEC_OACC_HOST_DATA:
@@ -5826,6 +5855,8 @@  oacc_code_to_statement (gfc_code *code)
       return ST_OACC_PARALLEL_LOOP;
     case EXEC_OACC_KERNELS_LOOP:
       return ST_OACC_KERNELS_LOOP;
+    case EXEC_OACC_SERIAL_LOOP:
+      return ST_OACC_SERIAL_LOOP;
     case EXEC_OACC_LOOP:
       return ST_OACC_LOOP;
     case EXEC_OACC_ATOMIC:
@@ -6163,6 +6194,7 @@  gfc_resolve_oacc_directive (gfc_code *code, gfc_namespace *ns ATTRIBUTE_UNUSED)
     {
     case EXEC_OACC_PARALLEL:
     case EXEC_OACC_KERNELS:
+    case EXEC_OACC_SERIAL:
     case EXEC_OACC_DATA:
     case EXEC_OACC_HOST_DATA:
     case EXEC_OACC_UPDATE:
@@ -6174,6 +6206,7 @@  gfc_resolve_oacc_directive (gfc_code *code, gfc_namespace *ns ATTRIBUTE_UNUSED)
       break;
     case EXEC_OACC_PARALLEL_LOOP:
     case EXEC_OACC_KERNELS_LOOP:
+    case EXEC_OACC_SERIAL_LOOP:
     case EXEC_OACC_LOOP:
       resolve_oacc_loop (code);
       break;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 15f6bf2937c4..1a38606682ca 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -683,6 +683,9 @@  decode_oacc_directive (void)
       matcha ("end parallel loop", gfc_match_omp_eos_error,
 	      ST_OACC_END_PARALLEL_LOOP);
       matcha ("end parallel", gfc_match_omp_eos_error, ST_OACC_END_PARALLEL);
+      matcha ("end serial loop", gfc_match_omp_eos_error,
+	      ST_OACC_END_SERIAL_LOOP);
+      matcha ("end serial", gfc_match_omp_eos_error, ST_OACC_END_SERIAL);
       matcha ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DATA);
       matcha ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA);
       break;
@@ -705,6 +708,10 @@  decode_oacc_directive (void)
     case 'r':
       match ("routine", gfc_match_oacc_routine, ST_OACC_ROUTINE);
       break;
+    case 's':
+      matcha ("serial loop", gfc_match_oacc_serial_loop, ST_OACC_SERIAL_LOOP);
+      matcha ("serial", gfc_match_oacc_serial, ST_OACC_SERIAL);
+      break;
     case 'u':
       matcha ("update", gfc_match_oacc_update, ST_OACC_UPDATE);
       break;
@@ -1583,7 +1590,8 @@  next_statement (void)
   case ST_CRITICAL: \
   case ST_OACC_PARALLEL_LOOP: case ST_OACC_PARALLEL: case ST_OACC_KERNELS: \
   case ST_OACC_DATA: case ST_OACC_HOST_DATA: case ST_OACC_LOOP: \
-  case ST_OACC_KERNELS_LOOP: case ST_OACC_ATOMIC
+  case ST_OACC_KERNELS_LOOP: case ST_OACC_SERIAL_LOOP: case ST_OACC_SERIAL: \
+  case ST_OACC_ATOMIC
 
 /* Declaration statements */
 
@@ -2157,6 +2165,18 @@  gfc_ascii_statement (gfc_statement st)
     case ST_OACC_END_KERNELS_LOOP:
       p = "!$ACC END KERNELS LOOP";
       break;
+    case ST_OACC_SERIAL_LOOP:
+      p = "!$ACC SERIAL LOOP";
+      break;
+    case ST_OACC_END_SERIAL_LOOP:
+      p = "!$ACC END SERIAL LOOP";
+      break;
+    case ST_OACC_SERIAL:
+      p = "!$ACC SERIAL";
+      break;
+    case ST_OACC_END_SERIAL:
+      p = "!$ACC END SERIAL";
+      break;
     case ST_OACC_DATA:
       p = "!$ACC DATA";
       break;
@@ -5065,6 +5085,9 @@  parse_oacc_structured_block (gfc_statement acc_st)
     case ST_OACC_KERNELS:
       acc_end_st = ST_OACC_END_KERNELS;
       break;
+    case ST_OACC_SERIAL:
+      acc_end_st = ST_OACC_END_SERIAL;
+      break;
     case ST_OACC_DATA:
       acc_end_st = ST_OACC_END_DATA;
       break;
@@ -5149,6 +5172,7 @@  parse_oacc_loop (gfc_statement acc_st)
     gfc_warning (0, "Redundant !$ACC END LOOP at %C");
   if ((acc_st == ST_OACC_PARALLEL_LOOP && st == ST_OACC_END_PARALLEL_LOOP) ||
       (acc_st == ST_OACC_KERNELS_LOOP && st == ST_OACC_END_KERNELS_LOOP) ||
+      (acc_st == ST_OACC_SERIAL_LOOP && st == ST_OACC_END_SERIAL_LOOP) ||
       (acc_st == ST_OACC_LOOP && st == ST_OACC_END_LOOP))
     {
       gcc_assert (new_st.op == EXEC_NOP);
@@ -5488,6 +5512,7 @@  parse_executable (gfc_statement st)
 
 	case ST_OACC_PARALLEL_LOOP:
 	case ST_OACC_KERNELS_LOOP:
+	case ST_OACC_SERIAL_LOOP:
 	case ST_OACC_LOOP:
 	  st = parse_oacc_loop (st);
 	  if (st == ST_IMPLIED_ENDDO)
@@ -5496,6 +5521,7 @@  parse_executable (gfc_statement st)
 
 	case ST_OACC_PARALLEL:
 	case ST_OACC_KERNELS:
+	case ST_OACC_SERIAL:
 	case ST_OACC_DATA:
 	case ST_OACC_HOST_DATA:
 	  parse_oacc_structured_block (st);
@@ -6544,6 +6570,8 @@  is_oacc (gfc_state_data *sd)
     case EXEC_OACC_PARALLEL:
     case EXEC_OACC_KERNELS_LOOP:
     case EXEC_OACC_KERNELS:
+    case EXEC_OACC_SERIAL_LOOP:
+    case EXEC_OACC_SERIAL:
     case EXEC_OACC_DATA:
     case EXEC_OACC_HOST_DATA:
     case EXEC_OACC_LOOP:
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 218c2edba57c..9b1437d70327 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -10576,6 +10576,8 @@  gfc_resolve_blocks (gfc_code *b, gfc_namespace *ns)
 	case EXEC_OACC_PARALLEL:
 	case EXEC_OACC_KERNELS_LOOP:
 	case EXEC_OACC_KERNELS:
+	case EXEC_OACC_SERIAL_LOOP:
+	case EXEC_OACC_SERIAL:
 	case EXEC_OACC_DATA:
 	case EXEC_OACC_HOST_DATA:
 	case EXEC_OACC_LOOP:
@@ -11527,6 +11529,8 @@  gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
 	    case EXEC_OACC_PARALLEL:
 	    case EXEC_OACC_KERNELS_LOOP:
 	    case EXEC_OACC_KERNELS:
+	    case EXEC_OACC_SERIAL_LOOP:
+	    case EXEC_OACC_SERIAL:
 	    case EXEC_OACC_DATA:
 	    case EXEC_OACC_HOST_DATA:
 	    case EXEC_OACC_LOOP:
@@ -11940,6 +11944,8 @@  start:
 	case EXEC_OACC_PARALLEL:
 	case EXEC_OACC_KERNELS_LOOP:
 	case EXEC_OACC_KERNELS:
+	case EXEC_OACC_SERIAL_LOOP:
+	case EXEC_OACC_SERIAL:
 	case EXEC_OACC_DATA:
 	case EXEC_OACC_HOST_DATA:
 	case EXEC_OACC_LOOP:
diff --git a/gcc/fortran/st.c b/gcc/fortran/st.c
index ee18d7aea8ad..12eed71e3a26 100644
--- a/gcc/fortran/st.c
+++ b/gcc/fortran/st.c
@@ -202,6 +202,8 @@  gfc_free_statement (gfc_code *p)
     case EXEC_OACC_PARALLEL:
     case EXEC_OACC_KERNELS_LOOP:
     case EXEC_OACC_KERNELS:
+    case EXEC_OACC_SERIAL_LOOP:
+    case EXEC_OACC_SERIAL:
     case EXEC_OACC_DATA:
     case EXEC_OACC_HOST_DATA:
     case EXEC_OACC_LOOP:
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 14a3c3e42843..0d5a5a9615d6 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3141,7 +3141,7 @@  gfc_trans_omp_code (gfc_code *code, bool force_empty)
 }
 
 /* Trans OpenACC directives. */
-/* parallel, kernels, data and host_data. */
+/* parallel, serial, kernels, data and host_data. */
 static tree
 gfc_trans_oacc_construct (gfc_code *code)
 {
@@ -3157,6 +3157,9 @@  gfc_trans_oacc_construct (gfc_code *code)
       case EXEC_OACC_KERNELS:
 	construct_code = OACC_KERNELS;
 	break;
+      case EXEC_OACC_SERIAL:
+	construct_code = OACC_SERIAL;
+	break;
       case EXEC_OACC_DATA:
 	construct_code = OACC_DATA;
 	break;
@@ -3964,7 +3967,8 @@  gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   return gfc_finish_block (&block);
 }
 
-/* parallel loop and kernels loop. */
+/* Combined OpenACC parallel loop, kernels loop and serial loop. */
+
 static tree
 gfc_trans_oacc_combined_directive (gfc_code *code)
 {
@@ -3982,6 +3986,9 @@  gfc_trans_oacc_combined_directive (gfc_code *code)
       case EXEC_OACC_KERNELS_LOOP:
 	construct_code = OACC_KERNELS;
 	break;
+      case EXEC_OACC_SERIAL_LOOP:
+	construct_code = OACC_SERIAL;
+	break;
       default:
 	gcc_unreachable ();
     }
@@ -5214,9 +5221,11 @@  gfc_trans_oacc_directive (gfc_code *code)
     {
     case EXEC_OACC_PARALLEL_LOOP:
     case EXEC_OACC_KERNELS_LOOP:
+    case EXEC_OACC_SERIAL_LOOP:
       return gfc_trans_oacc_combined_directive (code);
     case EXEC_OACC_PARALLEL:
     case EXEC_OACC_KERNELS:
+    case EXEC_OACC_SERIAL:
     case EXEC_OACC_DATA:
     case EXEC_OACC_HOST_DATA:
       return gfc_trans_oacc_construct (code);
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 2f878f6b1185..d9b278199b75 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -2137,6 +2137,8 @@  trans_code (gfc_code * code, tree cond)
 	case EXEC_OACC_KERNELS_LOOP:
 	case EXEC_OACC_PARALLEL:
 	case EXEC_OACC_PARALLEL_LOOP:
+	case EXEC_OACC_SERIAL:
+	case EXEC_OACC_SERIAL_LOOP:
 	case EXEC_OACC_ENTER_DATA:
 	case EXEC_OACC_EXIT_DATA:
 	case EXEC_OACC_ATOMIC:
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d5ece068053..f59cc2aa3188 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1676,6 +1676,9 @@  dump_gimple_omp_target (pretty_printer *buffer, gomp_target *gs,
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
       kind = " oacc_parallel";
       break;
+    case GF_OMP_TARGET_KIND_OACC_SERIAL:
+      kind = " oacc_serial";
+      break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
       kind = " oacc_data";
       break;
diff --git a/gcc/gimple.h b/gcc/gimple.h
index cf1f8da5ae24..83a449be3643 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -182,6 +182,7 @@  enum gf_mask {
     GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
     GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
     GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
+    GF_OMP_TARGET_KIND_OACC_SERIAL = 12,
     GF_OMP_TEAMS_GRID_PHONY	= 1 << 0,
     GF_OMP_TEAMS_HOST		= 1 << 1,
 
@@ -6476,6 +6477,7 @@  is_gimple_omp_oacc (const gimple *stmt)
 	{
 	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 	case GF_OMP_TARGET_KIND_OACC_KERNELS:
+	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_UPDATE:
 	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
@@ -6505,6 +6507,7 @@  is_gimple_omp_offloaded (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_REGION:
 	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 	case GF_OMP_TARGET_KIND_OACC_KERNELS:
+	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	  return true;
 	default:
 	  return false;
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 5fa0ba6dda60..94a69643aaab 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -161,6 +161,7 @@  enum omp_region_type
   ORT_ACC_DATA	= ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
   ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
   ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 2,  /* Kernels construct.  */
+  ORT_ACC_SERIAL = ORT_ACC | ORT_TARGET | 4,  /* Serial construct.  */
   ORT_ACC_HOST_DATA = ORT_ACC | ORT_TARGET_DATA | 2,  /* Host data.  */
 
   /* Dummy OpenMP region, used to disable expansion of
@@ -5551,6 +5552,7 @@  is_gimple_stmt (tree t)
     case STATEMENT_LIST:
     case OACC_PARALLEL:
     case OACC_KERNELS:
+    case OACC_SERIAL:
     case OACC_DATA:
     case OACC_HOST_DATA:
     case OACC_DECLARE:
@@ -7289,7 +7291,8 @@  oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
       break;
 
     case ORT_ACC_PARALLEL:
-      rkind = "parallel";
+    case ORT_ACC_SERIAL:
+      rkind = ctx->region_type == ORT_ACC_PARALLEL ? "parallel" : "serial";
 
       if (is_private)
 	flags |= GOVD_FIRSTPRIVATE;
@@ -10101,7 +10104,8 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	  /* Data clauses associated with acc parallel reductions must be
 	     compatible with present_or_copy.  Warn and adjust the clause
 	     if that is not the case.  */
-	  if (ctx->region_type == ORT_ACC_PARALLEL)
+	  if (ctx->region_type == ORT_ACC_PARALLEL
+	      || ctx->region_type == ORT_ACC_SERIAL)
 	    {
 	      tree t = DECL_P (decl) ? decl : TREE_OPERAND (decl, 0);
 	      n = NULL;
@@ -10277,7 +10281,8 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	  decl = OMP_CLAUSE_DECL (c);
 	  /* OpenACC reductions need a present_or_copy data clause.
 	     Add one if necessary.  Emit error when the reduction is private.  */
-	  if (ctx->region_type == ORT_ACC_PARALLEL)
+	  if (ctx->region_type == ORT_ACC_PARALLEL
+	      || ctx->region_type == ORT_ACC_SERIAL)
 	    {
 	      n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
 	      if (n->value & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE))
@@ -12529,6 +12534,9 @@  gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     case OACC_PARALLEL:
       ort = ORT_ACC_PARALLEL;
       break;
+    case OACC_SERIAL:
+      ort = ORT_ACC_SERIAL;
+      break;
     case OACC_DATA:
       ort = ORT_ACC_DATA;
       break;
@@ -12612,6 +12620,10 @@  gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
       stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_OACC_PARALLEL,
 				      OMP_CLAUSES (expr));
       break;
+    case OACC_SERIAL:
+      stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_OACC_SERIAL,
+				      OMP_CLAUSES (expr));
+      break;
     case OMP_SECTIONS:
       stmt = gimple_build_omp_sections (body, OMP_CLAUSES (expr));
       break;
@@ -13870,6 +13882,7 @@  gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OACC_DATA:
 	case OACC_KERNELS:
 	case OACC_PARALLEL:
+	case OACC_SERIAL:
 	case OMP_SECTIONS:
 	case OMP_SINGLE:
 	case OMP_TARGET:
@@ -14286,6 +14299,7 @@  gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 		  && code != EH_ELSE_EXPR
 		  && code != OACC_PARALLEL
 		  && code != OACC_KERNELS
+		  && code != OACC_SERIAL
 		  && code != OACC_DATA
 		  && code != OACC_HOST_DATA
 		  && code != OACC_DECLARE
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index eadff6e50f86..d242f4e1ae99 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7901,19 +7901,22 @@  expand_omp_target (struct omp_region *region)
   gimple *stmt;
   edge e;
   bool offloaded, data_region;
+  int target_kind;
 
   entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
+  target_kind = gimple_omp_target_kind (entry_stmt);
   new_bb = region->entry;
 
   offloaded = is_gimple_omp_offloaded (entry_stmt);
-  switch (gimple_omp_target_kind (entry_stmt))
+  switch (target_kind)
     {
     case GF_OMP_TARGET_KIND_REGION:
     case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_ENTER_DATA:
     case GF_OMP_TARGET_KIND_EXIT_DATA:
-    case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
@@ -7944,16 +7947,28 @@  expand_omp_target (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+  switch (target_kind)
     {
+    case GF_OMP_TARGET_KIND_OACC_KERNELS:
       mark_loops_in_oacc_kernels_region (region->entry, region->exit);
 
-      /* Further down, both OpenACC kernels and OpenACC parallel constructs
-	 will be mappted to BUILT_IN_GOACC_PARALLEL, and to distinguish the
-	 two, there is an "oacc kernels" attribute set for OpenACC kernels.  */
+      /* Further down, all OpenACC compute constructs will be mapped to
+	 BUILT_IN_GOACC_PARALLEL, and to distinguish between them, there
+	 is an "oacc kernels" attribute set for OpenACC kernels.  */
       DECL_ATTRIBUTES (child_fn)
 	= tree_cons (get_identifier ("oacc kernels"),
 		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
+    case GF_OMP_TARGET_KIND_OACC_SERIAL:
+      /* Further down, all OpenACC compute constructs will be mapped to
+	 BUILT_IN_GOACC_PARALLEL, and to distinguish between them, there
+	 is an "oacc serial" attribute set for OpenACC serial.  */
+      DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc serial"),
+		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
+    default:
+      break;
     }
 
   if (offloaded)
@@ -8158,6 +8173,7 @@  expand_omp_target (struct omp_region *region)
       break;
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_SERIAL:
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
@@ -8352,7 +8368,18 @@  expand_omp_target (struct omp_region *region)
 	args.quick_push (get_target_arguments (&gsi, entry_stmt));
       break;
     case BUILT_IN_GOACC_PARALLEL:
-      oacc_set_fn_attrib (child_fn, clauses, &args);
+      if (lookup_attribute ("oacc serial", DECL_ATTRIBUTES (child_fn)) != NULL)
+	{
+	  tree dims = NULL_TREE;
+	  unsigned int ix;
+
+	  /* For serial constructs we set all dimensions to 1.  */
+	  for (ix = GOMP_DIM_MAX; ix--;)
+	    dims = tree_cons (NULL_TREE, integer_one_node, dims);
+	  oacc_replace_fn_attrib (child_fn, dims);
+	}
+      else
+	oacc_set_fn_attrib (child_fn, clauses, &args);
       tagging = true;
       /* FALLTHRU */
     case BUILT_IN_GOACC_ENTER_EXIT_DATA:
@@ -8911,8 +8938,9 @@  build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 		{
 		case GF_OMP_TARGET_KIND_REGION:
 		case GF_OMP_TARGET_KIND_DATA:
-		case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 		case GF_OMP_TARGET_KIND_OACC_KERNELS:
+		case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+		case GF_OMP_TARGET_KIND_OACC_SERIAL:
 		case GF_OMP_TARGET_KIND_OACC_DATA:
 		case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 		  break;
@@ -9165,8 +9193,9 @@  omp_make_gimple_edges (basic_block bb, struct omp_region **region,
 	{
 	case GF_OMP_TARGET_KIND_REGION:
 	case GF_OMP_TARGET_KIND_DATA:
-	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 	case GF_OMP_TARGET_KIND_OACC_KERNELS:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fa76ceba33c6..fb2ddc5f354a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -185,15 +185,17 @@  static tree scan_omp_1_op (tree *, int *, void *);
       *handled_ops_p = false; \
       break;
 
-/* Return true if CTX corresponds to an oacc parallel region.  */
+/* Return true if CTX corresponds to an oacc parallel or serial region.  */
 
 static bool
-is_oacc_parallel (omp_context *ctx)
+is_oacc_parallel_or_serial (omp_context *ctx)
 {
   enum gimple_code outer_type = gimple_code (ctx->stmt);
   return ((outer_type == GIMPLE_OMP_TARGET)
-	  && (gimple_omp_target_kind (ctx->stmt)
-	      == GF_OMP_TARGET_KIND_OACC_PARALLEL));
+	  && ((gimple_omp_target_kind (ctx->stmt)
+	       == GF_OMP_TARGET_KIND_OACC_PARALLEL)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_SERIAL)));
 }
 
 /* Return true if CTX corresponds to an oacc kernels region.  */
@@ -1149,7 +1151,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  goto do_private;
 
 	case OMP_CLAUSE_REDUCTION:
-	  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
+	  if (is_oacc_parallel_or_serial (ctx) || is_oacc_kernels (ctx))
 	    ctx->local_reduction_clauses
 	      = tree_cons (NULL, c, ctx->local_reduction_clauses);
 	  /* FALLTHRU */
@@ -2391,7 +2393,7 @@  scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
     {
       omp_context *tgt = enclosing_target_ctx (outer_ctx);
 
-      if (!tgt || is_oacc_parallel (tgt))
+      if (!tgt || is_oacc_parallel_or_serial (tgt))
 	for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
 	  {
 	    char const *check = NULL;
@@ -2945,6 +2947,7 @@  check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 		  {
 		  case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 		  case GF_OMP_TARGET_KIND_OACC_KERNELS:
+		  case GF_OMP_TARGET_KIND_OACC_SERIAL:
 		    ok = true;
 		    break;
 
@@ -3393,6 +3396,7 @@  check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	      stmt_name = "target exit data"; break;
 	    case GF_OMP_TARGET_KIND_OACC_PARALLEL: stmt_name = "parallel"; break;
 	    case GF_OMP_TARGET_KIND_OACC_KERNELS: stmt_name = "kernels"; break;
+	    case GF_OMP_TARGET_KIND_OACC_SERIAL: stmt_name = "serial"; break;
 	    case GF_OMP_TARGET_KIND_OACC_DATA: stmt_name = "data"; break;
 	    case GF_OMP_TARGET_KIND_OACC_UPDATE: stmt_name = "update"; break;
 	    case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
@@ -3410,6 +3414,8 @@  check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	      ctx_stmt_name = "parallel"; break;
 	    case GF_OMP_TARGET_KIND_OACC_KERNELS:
 	      ctx_stmt_name = "kernels"; break;
+	    case GF_OMP_TARGET_KIND_OACC_SERIAL:
+	      ctx_stmt_name = "serial"; break;
 	    case GF_OMP_TARGET_KIND_OACC_DATA: ctx_stmt_name = "data"; break;
 	    case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	      ctx_stmt_name = "host_data"; break;
@@ -6711,8 +6717,10 @@  lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		    break;
 
 		  case GIMPLE_OMP_TARGET:
-		    if (gimple_omp_target_kind (probe->stmt)
-			!= GF_OMP_TARGET_KIND_OACC_PARALLEL)
+		    if ((gimple_omp_target_kind (probe->stmt)
+			 != GF_OMP_TARGET_KIND_OACC_PARALLEL)
+			&& (gimple_omp_target_kind (probe->stmt)
+			    != GF_OMP_TARGET_KIND_OACC_SERIAL))
 		      goto do_lookup;
 
 		    cls = gimple_omp_target_clauses (probe->stmt);
@@ -7518,7 +7526,7 @@  lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
 
   /* In a parallel region, loops are implicitly INDEPENDENT.  */
   omp_context *tgt = enclosing_target_ctx (ctx);
-  if (!tgt || is_oacc_parallel (tgt))
+  if (!tgt || is_oacc_parallel_or_serial (tgt))
     tag |= OLF_INDEPENDENT;
 
   if (tag & OLF_TILE)
@@ -11357,6 +11365,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GF_OMP_TARGET_KIND_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
+    case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
@@ -11531,7 +11540,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	break;
 
       case OMP_CLAUSE_FIRSTPRIVATE:
-	if (is_oacc_parallel (ctx))
+	if (is_oacc_parallel_or_serial (ctx))
 	  goto oacc_firstprivate;
 	map_cnt++;
 	var = OMP_CLAUSE_DECL (c);
@@ -11905,7 +11914,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	    break;
 
 	  case OMP_CLAUSE_FIRSTPRIVATE:
-	    if (is_oacc_parallel (ctx))
+	    if (is_oacc_parallel_or_serial (ctx))
 	      goto oacc_firstprivate_map;
 	    ovar = OMP_CLAUSE_DECL (c);
 	    if (omp_is_reference (ovar))
@@ -12439,7 +12448,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       gimple_seq fork_seq = NULL;
       gimple_seq join_seq = NULL;
 
-      if (is_oacc_parallel (ctx))
+      if (is_oacc_parallel_or_serial (ctx))
 	{
 	  /* If there are reductions on the offloaded region itself, treat
 	     them as a dummy GANG loop.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/serial-dims.c b/gcc/testsuite/c-c++-common/goacc/serial-dims.c
new file mode 100644
index 000000000000..41698d279c98
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/serial-dims.c
@@ -0,0 +1,12 @@ 
+/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
+   num_workers, vector_length with the serial construct.  */
+
+void f(void)
+{
+#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid for '#pragma acc serial'" } */
+  ;
+#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not valid for '#pragma acc serial'" } */
+  ;
+#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is not valid for '#pragma acc serial'" } */
+  ;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90 b/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90
new file mode 100644
index 000000000000..72b4a8361776
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90
@@ -0,0 +1,40 @@ 
+! Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
+! num_workers, vector_length with the serial construct.
+
+subroutine s()
+  integer :: i
+  !$acc parallel
+  !$acc end parallel
+
+  !$acc parallel loop
+  do i = 1, 5
+  end do
+
+  !$acc parallel loop
+  do i = 1, 5
+  end do
+  !$acc end parallel loop
+
+  !$acc serial loop
+  do i = 1, 5
+  end do
+
+  !$acc serial loop
+  do i = 1, 5
+  end do
+  !$acc end serial loop
+
+  !$acc serial
+  !$acc end serial
+end subroutine s
+
+subroutine f()
+!$acc serial num_gangs (1)  ! { dg-error "Failed to match clause at" }
+!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
+
+!$acc serial num_workers (1)  ! { dg-error "Failed to match clause at" }
+!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
+
+!$acc serial vector_length (1)  ! { dg-error "Failed to match clause at" }
+!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
+end subroutine f
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 53b3f55a3e6a..1cf7a9121336 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -3223,6 +3223,10 @@  dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       pp_string (pp, "#pragma acc kernels");
       goto dump_omp_clauses_body;
 
+    case OACC_SERIAL:
+      pp_string (pp, "#pragma acc serial");
+      goto dump_omp_clauses_body;
+
     case OACC_DATA:
       pp_string (pp, "#pragma acc data");
       dump_omp_clauses (pp, OACC_DATA_CLAUSES (node), spc, flags);
diff --git a/gcc/tree.def b/gcc/tree.def
index fb6e7344fa6b..e8bb4f37f802 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1095,6 +1095,12 @@  DEFTREECODE (OACC_PARALLEL, "oacc_parallel", tcc_statement, 2)
 
 DEFTREECODE (OACC_KERNELS, "oacc_kernels", tcc_statement, 2)
 
+/* OpenACC - #pragma acc serial [clause1 ... clauseN]
+   Operand 0: OMP_BODY: Code to be executed sequentially.
+   Operand 1: OMP_CLAUSES: List of clauses.  */
+
+DEFTREECODE (OACC_SERIAL, "oacc_serial", tcc_statement, 2)
+
 /* OpenACC - #pragma acc data [clause1 ... clauseN]
    Operand 0: OACC_DATA_BODY: Data construct body.
    Operand 1: OACC_DATA_CLAUSES: List of clauses.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
new file mode 100644
index 000000000000..bb91c9221f89
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
@@ -0,0 +1,92 @@ 
+/* OpenACC dimensions with the serial construct.  */
+
+#include <limits.h>
+#include <openacc.h>
+#include <gomp-constants.h>
+
+/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
+   not behaving as expected for -O0.  */
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
+{
+  if (acc_on_device ((int) acc_device_host))
+    return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+    return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+  else
+    __builtin_abort ();
+}
+
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
+{
+  if (acc_on_device ((int) acc_device_host))
+    return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+    return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+  else
+    __builtin_abort ();
+}
+
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
+{
+  if (acc_on_device ((int) acc_device_host))
+    return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+    return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+  else
+    __builtin_abort ();
+}
+
+
+int main ()
+{
+  acc_init (acc_device_default);
+
+  /* Serial OpenACC constructs must get launched as 1 x 1 x 1.  */
+  {
+    int gangs_min, gangs_max;
+    int workers_min, workers_max;
+    int vectors_min, vectors_max;
+    int gangs_actual, workers_actual, vectors_actual;
+    int i, j, k;
+
+    gangs_min = workers_min = vectors_min = INT_MAX;
+    gangs_max = workers_max = vectors_max = INT_MIN;
+    gangs_actual = workers_actual = vectors_actual = 1;
+#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */
+    {
+      if (acc_on_device (acc_device_nvidia))
+	{
+	  /* The GCC nvptx back end enforces vector_length (32).  */
+	  vectors_actual = 32;
+	}
+      else if (!acc_on_device (acc_device_host))
+	__builtin_abort ();
+#pragma acc loop gang \
+  reduction (min: gangs_min, workers_min, vectors_min) \
+  reduction (max: gangs_max, workers_max, vectors_max)
+      for (i = 100 * gangs_actual; i > -100 * gangs_actual; i--)
+#pragma acc loop worker \
+  reduction (min: gangs_min, workers_min, vectors_min) \
+  reduction (max: gangs_max, workers_max, vectors_max)
+	for (j = 100 * workers_actual; j > -100 * workers_actual; j--)
+#pragma acc loop vector \
+  reduction (min: gangs_min, workers_min, vectors_min) \
+  reduction (max: gangs_max, workers_max, vectors_max)
+	  for (k = 100 * vectors_actual; k > -100 * vectors_actual; k--)
+	    {
+	      gangs_min = gangs_max = acc_gang ();
+	      workers_min = workers_max = acc_worker ();
+	      vectors_min = vectors_max = acc_vector ();
+	    }
+      if (gangs_min != 0 || gangs_max != gangs_actual - 1
+	  || workers_min != 0 || workers_max != workers_actual - 1
+	  || vectors_min != 0 || vectors_max != vectors_actual - 1)
+	__builtin_abort ();
+    }
+  }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
new file mode 100644
index 000000000000..45c260510c29
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
@@ -0,0 +1,41 @@ 
+/* OpenACC dimensions with the serial construct.  */
+/* Used by serial-dims.f90.  */
+
+#include <limits.h>
+#include <openacc.h>
+#include <gomp-constants.h>
+
+/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
+   not behaving as expected for -O0.  */
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
+{
+  if (acc_on_device ((int) acc_device_host))
+    return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+    return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+  else
+    __builtin_abort ();
+}
+
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
+{
+  if (acc_on_device ((int) acc_device_host))
+    return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+    return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+  else
+    __builtin_abort ();
+}
+
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
+{
+  if (acc_on_device ((int) acc_device_host))
+    return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+    return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+  else
+    __builtin_abort ();
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90 b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90
new file mode 100644
index 000000000000..25c933629045
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90
@@ -0,0 +1,89 @@ 
+! OpenACC dimensions with the serial construct.
+
+! { dg-additional-sources serial-dims-aux.c }
+! { dg-warning "command line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
+
+module acc_routines
+  implicit none (type, external)
+
+  interface
+    integer function acc_gang() bind(C)
+      !$acc routine seq
+    end function acc_gang
+
+    integer function acc_worker() bind(C)
+      !$acc routine seq
+    end function acc_worker
+
+    integer function acc_vector() bind(C)
+      !$acc routine seq
+    end function acc_vector
+  end interface
+end module acc_routines
+
+program main
+  use iso_c_binding
+  use openacc
+  use acc_routines
+  implicit none (type, external)
+
+  integer :: gangs_min, gangs_max
+  integer :: workers_min, workers_max
+  integer :: vectors_min, vectors_max
+  integer :: gangs_actual, workers_actual, vectors_actual
+  integer :: i, j, k
+
+  call acc_init (acc_device_default)
+
+  ! Serial OpenACC constructs must get launched as 1 x 1 x 1.
+  gangs_min = huge(gangs_min)
+  workers_min = huge(workers_min)
+  vectors_min = huge(vectors_min)
+  gangs_max = -huge(gangs_max) - 1  ! INT_MIN
+  workers_max = -huge(gangs_max) - 1
+  vectors_max = -huge(gangs_max) - 1
+  gangs_actual = 1
+  workers_actual = 1
+  vectors_actual = 1
+
+  !$acc serial ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } }
+    if (acc_on_device (acc_device_nvidia)) then
+      ! The GCC nvptx back end enforces vector_length (32).
+      vectors_actual = 32
+    elseif (acc_on_device (acc_device_gcn)) then
+      ! AMD GCN relies on the autovectorizer for the vector dimension:
+      ! the loop below isn't likely to be vectorized, so vectors_actual
+      ! is effectively 1.
+      vectors_actual = 1
+    elseif (.not. acc_on_device (acc_device_host)) then
+      stop 1
+    end if
+
+!$acc loop gang &
+!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
+!$acc & reduction (max: gangs_max, workers_max, vectors_max)
+    do i = 100 * gangs_actual, -99 * gangs_actual, -1
+!$acc loop worker &
+!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
+!$acc & reduction (max: gangs_max, workers_max, vectors_max)
+      do j = 100 * workers_actual, -99 * workers_actual, -1
+!$acc loop vector &
+!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
+!$acc & reduction (max: gangs_max, workers_max, vectors_max)
+        do k = 100 * vectors_actual, -99 * vectors_actual, -1
+          gangs_min = acc_gang ();
+          gangs_max = acc_gang ();
+          workers_min = acc_worker ();
+          workers_max = acc_worker ();
+          vectors_min = acc_vector ();
+          vectors_max = acc_vector ();
+       end do
+     end do
+   end do
+  if (gangs_min /= 0 .or. gangs_max /= gangs_actual - 1 &
+      .or. workers_min /= 0 .or. workers_max /= workers_actual - 1 &
+      .or. vectors_min /= 0 .or. vectors_max /= vectors_actual - 1) &
+    stop 2
+!$acc end serial
+
+end program main