diff mbox

[gomp4] OpenACC documentation for libgomp.

Message ID 56954F48.10709@codesourcery.com
State New
Headers show

Commit Message

James Norris Jan. 12, 2016, 7:08 p.m. UTC
Hi,

Backported:

2016-01-12  James Norris  <jnorris@codesourcery.com>

         * libgomp.texi: Updates for OpenACC.

from trunk.

Thanks,
Jim
diff mbox

Patch

Index: ChangeLog.gomp
===================================================================
--- ChangeLog.gomp	(revision 232292)
+++ ChangeLog.gomp	(working copy)
@@ -1,3 +1,9 @@ 
+2016-01-12  James Norris  <jnorris@codesourcery.com>
+
+	Backport from trunk:
+	2016-01-12  James Norris  <jnorris@codesourcery.com>
+	* libgomp.texi: Updates for OpenACC.
+
 2016-01-11  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: Remove
Index: libgomp.texi
===================================================================
--- libgomp.texi	(revision 232292)
+++ libgomp.texi	(working copy)
@@ -94,6 +94,14 @@ 
 @comment  better formatting.
 @comment
 @menu
+* Enabling OpenMP::                  How to enable OpenMP for your
+                                     applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+                                     The OpenMP runtime application programming
+                                     interface.
+* OpenMP Environment Variables: Environment Variables.
+                                     Influencing OpenMP runtime behavior with
+                                     environment variables.
 * Enabling OpenACC::                 How to enable OpenACC for your
                                      applications.
 * OpenACC Runtime Library Routines:: The OpenACC runtime application
@@ -104,14 +112,6 @@ 
                                      asynchronous operations.
 * OpenACC Library Interoperability:: OpenACC library interoperability with the
                                      NVIDIA CUBLAS library.
-* Enabling OpenMP::                  How to enable OpenMP for your
-                                     applications.
-* OpenMP Runtime Library Routines: Runtime Library Routines.
-                                     The OpenMP runtime application programming
-                                     interface.
-* OpenMP Environment Variables: Environment Variables.
-                                     Influencing OpenMP runtime behavior with
-                                     environment variables.
 * The libgomp ABI::                  Notes on the external libgomp ABI.
 * Reporting Bugs::                   How to report bugs in the GNU Offloading
                                      and Multi Processing Runtime Library.
@@ -126,643 +126,6 @@ 
 
 
 @c ---------------------------------------------------------------------
-@c Enabling OpenACC
-@c ---------------------------------------------------------------------
-
-@node Enabling OpenACC
-@chapter Enabling OpenACC
-
-To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
-flag @command{-fopenacc} must be specified.  This enables the OpenACC directive
-@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
-@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
-@code{!$} conditional compilation sentinels in free form and @code{c$},
-@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
-arranges for automatic linking of the OpenACC runtime library 
-(@ref{OpenACC Runtime Library Routines}).
-
-A complete description of all OpenACC directives accepted may be found in 
-the @uref{http://www.openacc.org/, OpenMP Application Programming
-Interface} manual, version 2.0.
-
-Note that this is an experimental feature, incomplete, and subject to
-change in future versions of GCC.  See
-@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
-
-
-
-@c ---------------------------------------------------------------------
-@c OpenACC Runtime Library Routines
-@c ---------------------------------------------------------------------
-
-@node OpenACC Runtime Library Routines
-@chapter OpenACC Runtime Library Routines
-
-The runtime routines described here are defined by section 3 of the OpenACC
-specifications in version 2.0.
-They have C linkage, and do not throw exceptions.
-Generally, they are available only for the host, with the exception of
-@code{acc_on_device}, which is available for both the host and the
-acceleration device.
-
-@menu
-* acc_get_num_devices::         Get number of devices for the given device type
-* acc_set_device_type::
-* acc_get_device_type::
-* acc_set_device_num::
-* acc_get_device_num::
-* acc_init::
-* acc_shutdown::
-* acc_on_device::               Whether executing on a particular device
-* acc_malloc::
-* acc_free::
-* acc_copyin::
-* acc_present_or_copyin::
-* acc_create::
-* acc_present_or_create::
-* acc_copyout::
-* acc_delete::
-* acc_update_device::
-* acc_update_self::
-* acc_map_data::
-* acc_unmap_data::
-* acc_deviceptr::
-* acc_hostptr::
-* acc_is_present::
-* acc_memcpy_to_device::
-* acc_memcpy_from_device::
-
-API routines for target platforms.
-
-* acc_get_current_cuda_device::
-* acc_get_current_cuda_context::
-* acc_get_cuda_stream::
-* acc_set_cuda_stream::
-@end menu
-
-
-
-@node acc_get_num_devices
-@section @code{acc_get_num_devices} -- Get number of devices for given device type
-@table @asis
-@item @emph{Description}
-This routine returns a value indicating the
-number of devices available for the given device type.  It determines
-the number of devices in a @emph{passive} manner.  In other words, it
-does not alter the state within the runtime environment aside from
-possibly initializing an uninitialized device.  This aspect allows
-the routine to be called without concern for altering the interaction
-with an attached accelerator device.
-
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.1.
-@end table
-
-
-
-@node acc_set_device_type
-@section @code{acc_set_device_type}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.2.
-@end table
-
-
-
-@node acc_get_device_type
-@section @code{acc_get_device_type}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.3.
-@end table
-
-
-
-@node acc_set_device_num
-@section @code{acc_set_device_num}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.4.
-@end table
-
-
-
-@node acc_get_device_num
-@section @code{acc_get_device_num}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.5.
-@end table
-
-
-
-@node acc_init
-@section @code{acc_init}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.12.
-@end table
-
-
-
-@node acc_shutdown
-@section @code{acc_shutdown}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.13.
-@end table
-
-
-
-@node acc_on_device
-@section @code{acc_on_device} -- Whether executing on a particular device
-@table @asis
-@item @emph{Description}:
-This routine tells the program whether it is executing on a particular
-device.  Based on the argument passed, GCC tries to evaluate this to a
-constant at compile time, but library functions are also provided, for
-both the host and the acceleration device.
-
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.14.
-@end table
-
-
-
-@node acc_malloc
-@section @code{acc_malloc}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.15.
-@end table
-
-
-
-@node acc_free
-@section @code{acc_free}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.16.
-@end table
-
-
-
-@node acc_copyin
-@section @code{acc_copyin}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.17.
-@end table
-
-
-
-@node acc_present_or_copyin
-@section @code{acc_present_or_copyin}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.18.
-@end table
-
-
-
-@node acc_create
-@section @code{acc_create}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.19.
-@end table
-
-
-
-@node acc_present_or_create
-@section @code{acc_present_or_create}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.20.
-@end table
-
-
-
-@node acc_copyout
-@section @code{acc_copyout}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.21.
-@end table
-
-
-
-@node acc_delete
-@section @code{acc_delete}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.22.
-@end table
-
-
-
-@node acc_update_device
-@section @code{acc_update_device}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.23.
-@end table
-
-
-
-@node acc_update_self
-@section @code{acc_update_self}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.24.
-@end table
-
-
-
-@node acc_map_data
-@section @code{acc_map_data}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.25.
-@end table
-
-
-
-@node acc_unmap_data
-@section @code{acc_unmap_data}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.26.
-@end table
-
-
-
-@node acc_deviceptr
-@section @code{acc_deviceptr}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.27.
-@end table
-
-
-
-@node acc_hostptr
-@section @code{acc_hostptr}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.28.
-@end table
-
-
-
-@node acc_is_present
-@section @code{acc_is_present}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.29.
-@end table
-
-
-
-@node acc_memcpy_to_device
-@section @code{acc_memcpy_to_device}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.30.
-@end table
-
-
-
-@node acc_memcpy_from_device
-@section @code{acc_memcpy_from_device}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-3.2.31.
-@end table
-
-
-
-@node acc_get_current_cuda_device
-@section @code{acc_get_current_cuda_device}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-A.2.1.1.
-@end table
-
-
-
-@node acc_get_current_cuda_context
-@section @code{acc_get_current_cuda_context}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-A.2.1.2.
-@end table
-
-
-
-@node acc_get_cuda_stream
-@section @code{acc_get_cuda_stream}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-A.2.1.3.
-@end table
-
-
-
-@node acc_set_cuda_stream
-@section @code{acc_set_cuda_stream}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-A.2.1.4.
-@end table
-
-
-
-@c ---------------------------------------------------------------------
-@c OpenACC Environment Variables
-@c ---------------------------------------------------------------------
-
-@node OpenACC Environment Variables
-@chapter OpenACC Environment Variables
-
-The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
-are defined by section 4 of the OpenACC specification in version 2.0.
-The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
-
-@menu
-* ACC_DEVICE_TYPE::
-* ACC_DEVICE_NUM::
-* GCC_ACC_NOTIFY::
-@end menu
-
-
-
-@node ACC_DEVICE_TYPE
-@section @code{ACC_DEVICE_TYPE}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-4.1.
-@end table
-
-
-
-@node ACC_DEVICE_NUM
-@section @code{ACC_DEVICE_NUM}
-@table @asis
-@item @emph{Reference}:
-@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
-4.2.
-@end table
-
-
-
-@node GCC_ACC_NOTIFY
-@section @code{GCC_ACC_NOTIFY}
-@table @asis
-@item @emph{Description}:
-Print debug information pertaining to the accelerator.
-@end table
-
-
-
-@c ---------------------------------------------------------------------
-@c CUDA Streams Usage
-@c ---------------------------------------------------------------------
-
-@node CUDA Streams Usage
-@chapter CUDA Streams Usage
-
-This applies to the @code{nvptx} plugin only.
-
-The library provides elements that perform asynchronous movement of
-data and asynchronous operation of computing constructs.  This
-asynchronous functionality is implemented by making use of CUDA
-streams@footnote{See "Stream Management" in "CUDA Driver API",
-TRM-06703-001, Version 5.5, July 2013, for additional information}.
-
-The primary means by which the asychronous functionality is accessed
-is through the use of those OpenACC directives which make use of the
-@code{async} and @code{wait} clauses.  When the @code{async} clause is
-first used with a directive, it will create a CUDA stream.  If an
-@code{async-argument} is used with the @code{async} clause, then the
-stream will be associated with the specified @code{async-argument}.
-
-Following the creation of an association between a CUDA stream and the
-@code{async-argument} of an @code{async} clause, both the @code{wait}
-clause and the @code{wait} directive can be used.  When either the
-clause or directive is used after stream creation, it creates a
-rendezvous point whereby execution will wait until all operations
-associated with the @code{async-argument}, that is, stream, have
-completed.
-
-Normally, the management of the streams that are created as a result of
-using the @code{async} clause, is done without any intervention by the
-caller.  This implies the association between the @code{async-argument}
-and the CUDA stream will be maintained for the lifetime of the program.
-However, this association can be changed through the use of the library
-function @code{acc_set_cuda_stream}.  When the function
-@code{acc_set_cuda_stream} is used, the CUDA stream that was
-originally associated with the @code{async} clause will be destroyed.
-Caution should be taken when changing the association as subsequent
-references to the @code{async-argument} will be referring to a different
-CUDA stream.
-
-
-
-@c ---------------------------------------------------------------------
-@c OpenACC Library Interoperability
-@c ---------------------------------------------------------------------
-
-@node OpenACC Library Interoperability
-@chapter OpenACC Library Interoperability
-
-@section Introduction
-
-As the OpenACC library is built using the CUDA Driver API, the question has
-arisen on what impact does using the OpenACC library have on a program that
-uses the Runtime library, or a library based on the Runtime library, e.g.,
-CUBLAS@footnote{See section 2.26, "Interactions with the CUDA Driver API" in
-"CUDA Runtime API", Version 5.5, July 2013 and section 2.27, "VDPAU
-Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
-July 2013, for additional information on library interoperability.}.
-This chapter will describe the use cases and what changes are
-required in order to use both the OpenACC library and the CUBLAS and Runtime
-libraries within a program.
-
-@section First invocation: NVIDIA CUBLAS library API
-
-In this first use case (see below), a function in the CUBLAS library is called
-prior to any of the functions in the OpenACC library. More specifically, the
-function @code{cublasCreate()}.
-
-When invoked, the function will initialize the library and allocate the
-hardware resources on the host and the device on behalf of the caller. Once
-the initialization and allocation has completed, a handle is returned to the
-caller. The OpenACC library also requires initialization and allocation of
-hardware resources. Since the CUBLAS library has already allocated the
-hardware resources for the device, all that is left to do is to initialize
-the OpenACC library and acquire the hardware resources on the host.
-
-Prior to calling the OpenACC function that will initialize the library and
-allocate the host hardware resources, one needs to acquire the device number
-that was allocated during the call to @code{cublasCreate()}. The invoking of the
-runtime library function @code{cudaGetDevice()} will accomplish this. Once
-acquired, the device number is passed along with the device type as
-parameters to the OpenACC library function @code{acc_set_device_num()}.
-
-Once the call to @code{acc_set_device_num()} has completed, the OpenACC
-library will be using the  context that was created during the call to
-@code{cublasCreate()}. In other words, both libraries will be sharing the
-same context.
-
-@verbatim
-    /* Create the handle */
-    s = cublasCreate(&h);
-    if (s != CUBLAS_STATUS_SUCCESS)
-    {
-        fprintf(stderr, "cublasCreate failed %d\n", s);
-        exit(EXIT_FAILURE);
-    }
-
-    /* Get the device number */
-    e = cudaGetDevice(&dev);
-    if (e != cudaSuccess)
-    {
-        fprintf(stderr, "cudaGetDevice failed %d\n", e);
-        exit(EXIT_FAILURE);
-    }
-
-    /* Initialize OpenACC library and use device 'dev' */
-    acc_set_device_num(dev, acc_device_nvidia);
-
-@end verbatim
-@center Use Case 1 
-
-@section First invocation: OpenACC library API
-
-In this second use case (see below), a function in the OpenACC library is
-called prior to any of the functions in the CUBLAS library. More specificially,
-the function acc_set_device_num().
-
-In the use case presented here, the function @code{acc_set_device_num()}
-is used to both initialize the OpenACC library and allocate the hardware
-resources on the host and the device. In the call to the function, the
-call parameters specify which device to use, i.e., 'dev', and what device
-type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
-is but one method to initialize the OpenACC library and allocate the
-appropriate hardware resources. Other methods are available through the
-use of environment variables and these will be discussed in the next section.
-
-Once the call to @code{acc_set_device_num()} has completed, other OpenACC
-functions can be called as seen with multiple calls being made to
-@code{acc_copyin()}. In addition, calls can be made to functions in the
-CUBLAS library. In the use case a call to @code{cublasCreate()} is made
-subsequent to the calls to @code{acc_copyin()}.
-As seen in the previous use case, a call to @code{cublasCreate()} will
-initialize the CUBLAS library and allocate the hardware resources on the
-host and the device.  However, since the device has already been allocated,
-@code{cublasCreate()} will only initialize the CUBLAS library and allocate
-the appropriate hardware resources on the host. The context that was created
-as part of the OpenACC initialization will be shared with the CUBLAS library,
-similarly to the first use case.
-
-@verbatim
-    dev = 0;
-
-    acc_set_device_num(dev, acc_device_nvidia);
-
-    /* Copy the first set to the device */
-    d_X = acc_copyin(&h_X[0], N * sizeof (float));
-    if (d_X == NULL)
-    { 
-        fprintf(stderr, "copyin error h_X\n");
-        exit(EXIT_FAILURE);
-    }
-
-    /* Copy the second set to the device */
-    d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
-    if (d_Y == NULL)
-    { 
-        fprintf(stderr, "copyin error h_Y1\n");
-        exit(EXIT_FAILURE);
-    }
-
-    /* Create the handle */
-    s = cublasCreate(&h);
-    if (s != CUBLAS_STATUS_SUCCESS)
-    {
-        fprintf(stderr, "cublasCreate failed %d\n", s);
-        exit(EXIT_FAILURE);
-    }
-
-    /* Perform saxpy using CUBLAS library function */
-    s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
-    if (s != CUBLAS_STATUS_SUCCESS)
-    {
-        fprintf(stderr, "cublasSaxpy failed %d\n", s);
-        exit(EXIT_FAILURE);
-    }
-
-    /* Copy the results from the device */
-    acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
-
-}
-@end verbatim
-@center Use Case 2
-
-@section OpenACC library and environment variables
-
-There are two environment variables associated with the OpenACC library that
-may be used to control the device type and device number.
-Namely, @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}. In the second
-use case, the device type and device number were specified using
-@code{acc_set_device_num()}. However, @env{ACC_DEVICE_TYPE} and 
-@env{ACC_DEVICE_NUM} could have been defined and the call to
-@code{acc_set_device_num()} would be not be required. At the time of the
-call to @code{acc_copyin()}, these two environment variables would be
-sampled and their values used.
-
-The use of the environment variables is only relevant when an OpenACC function
-is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
-is called prior to a call to an OpenACC function, then a call to
-@code{acc_set_device_num()}, must be done@footnote{More complete information
-about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
-sections 4.1 and 4.2 of the “The OpenACC
-Application Programming Interface”, Version 2.0, June, 2013.}.
-
-
-
-@c ---------------------------------------------------------------------
 @c Enabling OpenMP
 @c ---------------------------------------------------------------------
 
@@ -2440,6 +1803,1272 @@ 
 
 
 @c ---------------------------------------------------------------------
+@c Enabling OpenACC
+@c ---------------------------------------------------------------------
+
+@node Enabling OpenACC
+@chapter Enabling OpenACC
+
+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
+flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
+@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
+@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
+@code{!$} conditional compilation sentinels in free form and @code{c$},
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
+arranges for automatic linking of the OpenACC runtime library 
+(@ref{OpenACC Runtime Library Routines}).
+
+A complete description of all OpenACC directives accepted may be found in 
+the @uref{http://www.openacc.org/, OpenACC} Application Programming
+Interface manual, version 2.0.
+
+Note that this is an experimental feature and subject to
+change in future versions of GCC.  See
+@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Runtime Library Routines
+@c ---------------------------------------------------------------------
+
+@node OpenACC Runtime Library Routines
+@chapter OpenACC Runtime Library Routines
+
+The runtime routines described here are defined by section 3 of the OpenACC
+specifications in version 2.0.
+They have C linkage, and do not throw exceptions.
+Generally, they are available only for the host, with the exception of
+@code{acc_on_device}, which is available for both the host and the
+acceleration device.
+
+@menu
+* acc_get_num_devices::         Get number of devices for the given device
+                                type.
+* acc_set_device_type::         Set type of device accelerator to use.
+* acc_get_device_type::         Get type of device accelerator to be used.
+* acc_set_device_num::          Set device number to use.
+* acc_get_device_num::          Get device number to be used.
+* acc_async_test::              Tests for completion of a specific asynchronous
+                                operation.
+* acc_async_test_all::          Tests for completion of all asychronous
+                                operations.
+* acc_wait::                    Wait for completion of a specific asynchronous
+                                operation.
+* acc_wait_all::                Waits for completion of all asyncrhonous
+                                operations.
+* acc_wait_all_async::          Wait for completion of all asynchronous
+                                operations.
+* acc_wait_async::              Wait for completion of asynchronous operations.
+* acc_init::                    Initialize runtime for a specific device type.
+* acc_shutdown::                Shuts down the runtime for a specific device
+                                type.
+* acc_on_device::               Whether executing on a particular device
+* acc_malloc::                  Allocate device memory.
+* acc_free::                    Free device memory.
+* acc_copyin::                  Allocate device memory and copy host memory to
+                                it.
+* acc_present_or_copyin::       If the data is not present on the device,
+                                allocate device memory and copy from host
+                                memory.
+* acc_create::                  Allocate device memory and map it to host
+                                memory.
+* acc_present_or_create::       If the data is not present on the device,
+                                allocate device memory and map it to host
+                                memory.
+* acc_copyout::                 Copy device memory to host memory.
+* acc_delete::                  Free device memory.
+* acc_update_device::           Update device memory from mapped host memory.
+* acc_update_self::             Update host memory from mapped device memory.
+* acc_map_data::                Map previously allocated device memory to host
+                                memory.
+* acc_unmap_data::              Unmap device memory from host memory.
+* acc_deviceptr::               Get device pointer associated with specific
+                                host address.
+* acc_hostptr::                 Get host pointer associated with specific
+                                device address.
+* acc_is_present::              Indiciate whether host variable / array is
+                                present on device.
+* acc_memcpy_to_device::        Copy host memory to device memory.
+* acc_memcpy_from_device::      Copy device memory to host memory.
+
+API routines for target platforms.
+
+* acc_get_current_cuda_device:: Get CUDA device handle.
+* acc_get_current_cuda_context::Get CUDA context handle.
+* acc_get_cuda_stream::         Get CUDA stream handle.
+* acc_set_cuda_stream::         Set CUDA stream handle.
+@end menu
+
+
+
+@node acc_get_num_devices
+@section @code{acc_get_num_devices} -- Get number of devices for given device type
+@table @asis
+@item @emph{Description}
+This function returns a value indicating the number of devices available
+for the device type specified in @var{devicetype}. 
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
+@item                  @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.1.
+@end table
+
+
+
+@node acc_set_device_type
+@section @code{acc_set_device_type} -- Set type of device accelerator to use.
+@table @asis
+@item @emph{Description}
+This function indicates to the runtime library which device typr, specified
+in @var{devicetype}, to use when executing a parallel or kernels region. 
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.2.
+@end table
+
+
+
+@node acc_get_device_type
+@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
+@table @asis
+@item @emph{Description}
+This function returns what device type will be used when executing a
+parallel or kernels region.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
+@item                  @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.3.
+@end table
+
+
+
+@node acc_set_device_num
+@section @code{acc_set_device_num} -- Set device number to use.
+@table @asis
+@item @emph{Description}
+This function will indicate to the runtime which device number,
+specified by @var{num}, associated with the specifed device
+type @var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
+@item                   @tab @code{integer devicenum}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.4.
+@end table
+
+
+
+@node acc_get_device_num
+@section @code{acc_get_device_num} -- Get device number to be used.
+@table @asis
+@item @emph{Description}
+This function returns which device number associated with the specified device
+type @var{devicetype}, will be used when executing a parallel or kernels
+region.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@item                   @tab @code{integer acc_get_device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.5.
+@end table
+
+
+
+@node acc_async_test
+@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function tests for completion of the asynchrounous operation specified
+in @var{arg}. In C/C++, a non-zero value will be returned to indicate
+the specified asynchronous operation has completed. While Fortran will return
+a @code{true}. If the asynchrounous operation has not completed, C/C++ returns
+a zero and Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
+@item                   @tab @code{integer(kind=acc_handle_kind) arg}
+@item                   @tab @code{logical acc_async_test}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.6.
+@end table
+
+
+
+@node acc_async_test_all
+@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function tests for completion of all asynchrounous operations.
+In C/C++, a non-zero value will be returned to indicate all asynchronous
+operations have completed. While Fortran will return a @code{true}. If
+any asynchronous operation has not completed, C/C++ returns a zero and
+Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test()}
+@item                   @tab @code{logical acc_get_device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.7.
+@end table
+
+
+
+@node acc_wait
+@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function waits for completion of the asynchronous operation
+specified in @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait(arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
+@item                   @tab @code{integer(acc_handle_kind) arg}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.8.
+@end table
+
+
+
+@node acc_wait_all
+@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function waits for the completion of all asynchronous operations.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_async()}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.10.
+@end table
+
+
+
+@node acc_wait_all_async
+@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on the queue @var{async} for any
+and all asynchronous operations that have been previously enqueued on
+any queue.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
+@item                   @tab @code{integer(acc_handle_kind) async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.11.
+@end table
+
+
+
+@node acc_wait_async
+@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on queue @var{async} for any and all
+asynchronous operations enqueued on queue @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
+@item                   @tab @code{integer(acc_handle_kind) arg, async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.9.
+@end table
+
+
+
+@node acc_init
+@section @code{acc_init} -- Initialize runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function initializes the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.12.
+@end table
+
+
+
+@node acc_shutdown
+@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function shuts down the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.13.
+@end table
+
+
+
+@node acc_on_device
+@section @code{acc_on_device} -- Whether executing on a particular device
+@table @asis
+@item @emph{Description}:
+This function returns whether the program is executing on a particular
+device specified in @var{devicetype}. In C/C++ a non-zero value is
+returned to indicate the device is execiting on the specified device type.
+In Fortran, @code{true} will be returned. If the program is not executing
+on the specified device type C/C++ will return a zero, while Fortran will
+return @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@item                   @tab @code{logical acc_on_device}
+@end multitable
+
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.14.
+@end table
+
+
+
+@node acc_malloc
+@section @code{acc_malloc} -- Allocate device memory.
+@table @asis
+@item @emph{Description}
+This function allocates @var{len} bytes of device memory. It returns
+the device address of the allocated memory.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.15.
+@end table
+
+
+
+@node acc_free
+@section @code{acc_free} -- Free device memory.
+@table @asis
+@item @emph{Description}
+Free previously allocated device memory at the device address @code{a}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.16.
+@end table
+
+
+
+@node acc_copyin
+@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
+@table @asis
+@item @emph{Description}
+In C/C++, this function allocates @var{len} bytes of device memory
+and maps it to the specified host address in @var{a}. The device
+address of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a
+variable or array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.17.
+@end table
+
+
+
+@node acc_present_or_copyin
+@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and the host memory copied. The device address of
+the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
+@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.18.
+@end table
+
+
+
+@node acc_create
+@section @code{acc_create} -- Allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function allocates device memory and maps it to host memory specified
+by the host address @var{a} with a length of @var{len} bytes. In C/C++,
+the function returns the device address of the allocated device memory.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.19.
+@end table
+
+
+
+@node acc_present_or_create
+@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and mapped to host memory. In C/C++, the device address
+of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
+@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.20.
+@end table
+
+
+
+@node acc_copyout
+@section @code{acc_copyout} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies mapped device memory to host memory which is specified
+by host address @var{a} for a length @var{len} bytes in C/C++.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.21.
+@end table
+
+
+
+@node acc_delete
+@section @code{acc_delete} -- Free device memory.
+@table @asis
+@item @emph{Description}
+This function frees previously allocated device memory specified by
+the device address @var{a} and the length of @var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.22.
+@end table
+
+
+
+@node acc_update_device
+@section @code{acc_update_device} -- Update device memory from mapped host memory.
+@table @asis
+@item @emph{Description}
+This function updates the device copy from the previously mapped host memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.23.
+@end table
+
+
+
+@node acc_update_self
+@section @code{acc_update_self} -- Update host memory from mapped device memory.
+@table @asis
+@item @emph{Description}
+This function updates the host copy from the previously mapped device memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.24.
+@end table
+
+
+
+@node acc_map_data
+@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function maps previously allocated device and host memory. The device
+memory is specified with the device address @var{d}. The host memory is
+specified with the host address @var{h} and a length of @var{len}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.25.
+@end table
+
+
+
+@node acc_unmap_data
+@section @code{acc_unmap_data} -- Unmap device memory from host memory.
+@table @asis
+@item @emph{Description}
+This function unmaps previously mapped device and host memory. The latter
+specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.26.
+@end table
+
+
+
+@node acc_deviceptr
+@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
+@table @asis
+@item @emph{Description}
+This function returns the device address that has been mapped to the
+host address specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.27.
+@end table
+
+
+
+@node acc_hostptr
+@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
+@table @asis
+@item @emph{Description}
+This function returns the host address that has been mapped to the
+device address specified by @var{d}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.28.
+@end table
+
+
+
+@node acc_is_present
+@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
+@table @asis
+@item @emph{Description}
+This function indicates whether the specified host address in @var{a} and a
+length of @var{len} bytes is present on the device. In C/C++, a non-zero
+value is returned to indicate the presence of the mapped memory on the
+device. A zero is returned to indicate the memory is not mapped on the
+device.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes. If the host
+memory is mapped to device memory, then a @code{true} is returned. Otherwise,
+a @code{false} is return to indicate the mapped memory is not present.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_is_present(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{logical acc_is_present}
+@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item                   @tab @code{logical acc_is_present}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.29.
+@end table
+
+
+
+@node acc_memcpy_to_device
+@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} to
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.30.
+@end table
+
+
+
+@node acc_memcpy_from_device
+@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} from
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.31.
+@end table
+
+
+
+@node acc_get_current_cuda_device
+@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA device handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.1.
+@end table
+
+
+
+@node acc_get_current_cuda_context
+@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA context handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_get_current_cuda_context(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.2.
+@end table
+
+
+
+@node acc_get_cuda_stream
+@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA stream handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_get_cuda_stream(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.3.
+@end table
+
+
+
+@node acc_set_cuda_stream
+@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function associates the stream handle specified by @var{stream} with
+the asynchronous value specified by @var{async}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_cuda_stream(int async void *stream);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.4.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Environment Variables
+@c ---------------------------------------------------------------------
+
+@node OpenACC Environment Variables
+@chapter OpenACC Environment Variables
+
+The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
+are defined by section 4 of the OpenACC specification in version 2.0.
+The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
+
+@menu
+* ACC_DEVICE_TYPE::
+* ACC_DEVICE_NUM::
+* GCC_ACC_NOTIFY::
+@end menu
+
+
+
+@node ACC_DEVICE_TYPE
+@section @code{ACC_DEVICE_TYPE}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+4.1.
+@end table
+
+
+
+@node ACC_DEVICE_NUM
+@section @code{ACC_DEVICE_NUM}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+4.2.
+@end table
+
+
+
+@node GCC_ACC_NOTIFY
+@section @code{GCC_ACC_NOTIFY}
+@table @asis
+@item @emph{Description}:
+Print debug information pertaining to the accelerator.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c CUDA Streams Usage
+@c ---------------------------------------------------------------------
+
+@node CUDA Streams Usage
+@chapter CUDA Streams Usage
+
+This applies to the @code{nvptx} plugin only.
+
+The library provides elements that perform asynchronous movement of
+data and asynchronous operation of computing constructs.  This
+asynchronous functionality is implemented by making use of CUDA
+streams@footnote{See "Stream Management" in "CUDA Driver API",
+TRM-06703-001, Version 5.5, for additional information}.
+
+The primary means by that the asychronous functionality is accessed
+is through the use of those OpenACC directives which make use of the
+@code{async} and @code{wait} clauses.  When the @code{async} clause is
+first used with a directive, it creates a CUDA stream.  If an
+@code{async-argument} is used with the @code{async} clause, then the
+stream is associated with the specified @code{async-argument}.
+
+Following the creation of an association between a CUDA stream and the
+@code{async-argument} of an @code{async} clause, both the @code{wait}
+clause and the @code{wait} directive can be used.  When either the
+clause or directive is used after stream creation, it creates a
+rendezvous point whereby execution waits until all operations
+associated with the @code{async-argument}, that is, stream, have
+completed.
+
+Normally, the management of the streams that are created as a result of
+using the @code{async} clause, is done without any intervention by the
+caller.  This implies the association between the @code{async-argument}
+and the CUDA stream will be maintained for the lifetime of the program.
+However, this association can be changed through the use of the library
+function @code{acc_set_cuda_stream}.  When the function
+@code{acc_set_cuda_stream} is called, the CUDA stream that was
+originally associated with the @code{async} clause will be destroyed.
+Caution should be taken when changing the association as subsequent
+references to the @code{async-argument} refer to a different
+CUDA stream.
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Library Interoperability
+@c ---------------------------------------------------------------------
+
+@node OpenACC Library Interoperability
+@chapter OpenACC Library Interoperability
+
+@section Introduction
+
+The OpenACC library uses the CUDA Driver API, and may interact with
+programs that use the Runtime library directly, or another library
+based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
+"Interactions with the CUDA Driver API" in
+"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
+Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
+for additional information on library interoperability.}.
+This chapter describes the use cases and what changes are
+required in order to use both the OpenACC library and the CUBLAS and Runtime
+libraries within a program.
+
+@section First invocation: NVIDIA CUBLAS library API
+
+In this first use case (see below), a function in the CUBLAS library is called
+prior to any of the functions in the OpenACC library. More specifically, the
+function @code{cublasCreate()}.
+
+When invoked, the function initializes the library and allocates the
+hardware resources on the host and the device on behalf of the caller. Once
+the initialization and allocation has completed, a handle is returned to the
+caller. The OpenACC library also requires initialization and allocation of
+hardware resources. Since the CUBLAS library has already allocated the
+hardware resources for the device, all that is left to do is to initialize
+the OpenACC library and acquire the hardware resources on the host.
+
+Prior to calling the OpenACC function that initializes the library and
+allocate the host hardware resources, you need to acquire the device number
+that was allocated during the call to @code{cublasCreate()}. The invoking of the
+runtime library function @code{cudaGetDevice()} accomplishes this. Once
+acquired, the device number is passed along with the device type as
+parameters to the OpenACC library function @code{acc_set_device_num()}.
+
+Once the call to @code{acc_set_device_num()} has completed, the OpenACC
+library uses the  context that was created during the call to
+@code{cublasCreate()}. In other words, both libraries will be sharing the
+same context.
+
+@smallexample
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Get the device number */
+    e = cudaGetDevice(&dev);
+    if (e != cudaSuccess)
+    @{
+        fprintf(stderr, "cudaGetDevice failed %d\n", e);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Initialize OpenACC library and use device 'dev' */
+    acc_set_device_num(dev, acc_device_nvidia);
+
+@end smallexample
+@center Use Case 1 
+
+@section First invocation: OpenACC library API
+
+In this second use case (see below), a function in the OpenACC library is
+called prior to any of the functions in the CUBLAS library. More specificially,
+the function @code{acc_set_device_num()}.
+
+In the use case presented here, the function @code{acc_set_device_num()}
+is used to both initialize the OpenACC library and allocate the hardware
+resources on the host and the device. In the call to the function, the
+call parameters specify which device to use and what device
+type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
+is but one method to initialize the OpenACC library and allocate the
+appropriate hardware resources. Other methods are available through the
+use of environment variables and these will be discussed in the next section.
+
+Once the call to @code{acc_set_device_num()} has completed, other OpenACC
+functions can be called as seen with multiple calls being made to
+@code{acc_copyin()}. In addition, calls can be made to functions in the
+CUBLAS library. In the use case a call to @code{cublasCreate()} is made
+subsequent to the calls to @code{acc_copyin()}.
+As seen in the previous use case, a call to @code{cublasCreate()}
+initializes the CUBLAS library and allocates the hardware resources on the
+host and the device.  However, since the device has already been allocated,
+@code{cublasCreate()} will only initialize the CUBLAS library and allocate
+the appropriate hardware resources on the host. The context that was created
+as part of the OpenACC initialization is shared with the CUBLAS library,
+similarly to the first use case.
+
+@smallexample
+    dev = 0;
+
+    acc_set_device_num(dev, acc_device_nvidia);
+
+    /* Copy the first set to the device */
+    d_X = acc_copyin(&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    @{ 
+        fprintf(stderr, "copyin error h_X\n");
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Copy the second set to the device */
+    d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    @{ 
+        fprintf(stderr, "copyin error h_Y1\n");
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Perform saxpy using CUBLAS library function */
+    s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasSaxpy failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Copy the results from the device */
+    acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
+
+@end smallexample
+@center Use Case 2
+
+@section OpenACC library and environment variables
+
+There are two environment variables associated with the OpenACC library
+that may be used to control the device type and device number:
+@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two
+environement variables can be used as an alternative to calling
+@code{acc_set_device_num()}. As seen in the second use case, the device
+type and device number were specified using @code{acc_set_device_num()}.
+If however, the aforementioned environment variables were set, then the
+call to @code{acc_set_device_num()} would not be required.
+
+
+The use of the environment variables is only relevant when an OpenACC function
+is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
+is called prior to a call to an OpenACC function, then you must call
+@code{acc_set_device_num()}@footnote{More complete information
+about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
+sections 4.1 and 4.2 of the @uref{http://www.openacc.org/, OpenACC}
+Application Programming Interface”, Version 2.0.}
+
+
+
+@c ---------------------------------------------------------------------
 @c The libgomp ABI
 @c ---------------------------------------------------------------------