diff mbox

[4.7,RFC,Fortran] Coarray: Moving towards real parallelization

Message ID 4D6EA9AA.2040408@net-b.de
State New
Headers show

Commit Message

Tobias Burnus March 2, 2011, 8:33 p.m. UTC
For gcc-patches readers: Fortran has since the 2008 version of the 
standard a build-in means of parallelization called "coarrays". That 
makes Fortran a PGAS language (PGAS = Partitioned Global Address Space); 
the C equivalent is called UPC (Unified Parallel C) but contrary to 
coarray it is not an ISO standard. (There is a certain demand by users 
to have a coarray implementation and other Fortran vendors are also 
planning to support coarrays or have already implemented it in released 
products. The latter includes Cray, Intel and g95.)


The attached patch makes the first moves towards multi-image coarray 
support. The idea is that the front end generates calls to a library. 
That library will be first implemented using MPI (Message Passing 
Interface), which brings me to a problem: How to include that library, 
which is heavily depending on the MPI installation, in GCC?

(Later it is planned to also offer a shared memory version. But first 
the library version needs to be working.)


Example usage

Compile the parallelization (wrapper) library:
    mpicc -c libgfortrancaf_mpi.c
Compile the actual program:
    mpif90 -fcoarray=lib caf.f90 libgfortrancaf_mpi.o -o caf
Run the program
   mpirun -n 10 ./caf

Attached you find a patch which adds the necessary calls for:
- Initialization and finalization
- STOP/ERROR STOP
- SYNC IMAGES, SYNC ALL and SYNC MEMORY
- this_image() and num_images()
- CRITICAL; ...; END CRITICAL block

Known limitations:
- SYNC IMAGES won't work with an array of non-c_int integers
- this_image() only works without arguments
- The library implementation is very rough and for SYNC IMAGES() is 
mostly wrong
- For CALL EXIT and CALL ABORT the parallelization library is not finalized

NOTE: All items not listed are not implemented. In particular accessing 
remote coarrays is not supported; neither is locking, atomic,  
image_index, etc.

I would like to include - at least the front-end part - in GCC as when 
the 4.7 trunk opens.

Questions:

- Could someone review the front-end patch?

- Where to place the library *.h and *.c files? How to make them 
available with "make install"?

- How to create test cases for the test suite?

- Are there other comments?

- Do we have a function which converts a rank-1 array from, let's say, 
INTEGER(1) or INTEGER(16) to INTEGER(4)? Possibly, the the packing of 
the array (if needed) and the conversion could be done in one step...

Tobias

PS: Just for the sake of it: An example program. Compare the 
-fdump-tree-original dump between -fcoarray=single and -fcoarray=lib; 
also for -fcheck=bounds.

program test_caf
   character(len=40) :: errstr
   integer:: st
   integer :: images(2) = [1,4]
   print *, 'Hello World', this_image(), num_images()
   SYNC IMAGES(images,errmsg=errstr, stat=st)
   print *, 'Sync Image stat=', st
   SYNC MEMORY
   CRITICAL
      write(*,*) 'In critical section: ',this_image()
   END CRITICAL
end program test_caf

Comments

Ralf Wildenhues March 2, 2011, 9:32 p.m. UTC | #1
Hello Tobias,

* Tobias Burnus wrote on Wed, Mar 02, 2011 at 09:33:46PM CET:
> The attached patch makes the first moves towards multi-image coarray
> support. The idea is that the front end generates calls to a
> library. That library will be first implemented using MPI (Message
> Passing Interface), which brings me to a problem: How to include
> that library, which is heavily depending on the MPI installation, in
> GCC?

The library will depend on the MPI installation, obviously.
You can make a new target directory and overwrite CC with MPICC
for that directory.  For choosing the latter, the AX_MPI macro from
the Autoconf Macro Archive can help.  (I can help with the configure
setup if needed, just not timely).

For distributions, they will need to make the library depend on the MPI
implementation (e.g., Debian provides mpicc.mpich, mpicc.mpich2, and
mpicc.openmpi, among others).

> Compile the parallelization (wrapper) library:
>    mpicc -c libgfortrancaf_mpi.c
> Compile the actual program:
>    mpif90 -fcoarray=lib caf.f90 libgfortrancaf_mpi.o -o caf
> Run the program
>   mpirun -n 10 ./caf

mpirun is another, far more troublesome beast, and not standardized at
all; while mpiexec is, it is not fully portable in practice.  I have
some macros to run code on the systems that I know (for llrun, qsub and
so on), but that won't help too much for the general case.

But maybe you don't need to take care of that but can leave it to the
user of GCC.

> - Where to place the library *.h and *.c files? How to make them
> available with "make install"?

Ideally, in some directory where the MPI implementation can easily be
encoded in the installation directory name.

> - How to create test cases for the test suite?

That's somewhat of a problem, see above.

> - Are there other comments?

> This file is part of the GNU Fortran Coarray Runtime library
> (libgfortrancaf).

> /* Global variables.  */
> static int caf_this_image;
> static MPI_Win caf_world_window;

MPI_Win and one-sided communication is an MPI 2 feature; you need to
test that since there are (IME) still implementations out there that
don't fully support this.

> /* Initialize coarray program.  This routine assumes that no other
>    MPI initialization happened before; otherwise MPI_Initialized
>    had to be used.  As the MPI library might modify the command-line
>    arguments, the routine should be called before the run-time
>    libaray is initialized.  */
> 
> void
> _gfortran_caf_init (int *argc, char ***argv, int *this_image, int *num_images)
> {
>   MPI_Init (argc, argv);

If you're relying on MPI2 anyway, I suggest you try to not limit your
users to not use MPI themselves.  So please use MPI_Initialized, and
don't finalize if the library isn't responsible.

>   MPI_Comm_rank (MPI_COMM_WORLD, &caf_this_image);
>   *this_image = caf_this_image + 1;
>   MPI_Comm_size (MPI_COMM_WORLD, num_images);
> 
>   /* Obtain window for CRITICAL section locking.  */
>   MPI_Win_create (NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD,
> 		  &caf_world_window);
> }
> 
> 
> /* Finalize coarray program. */
> 
> void
> _gfortran_caf_finalize (void)
> {
>   MPI_Win_free (&caf_world_window);
>   MPI_Finalize ();
> }
[...]

> /* ERROR STOP the other images.  */
> 
> static void
> error_stop (int error)
> {
>   /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
>   /* FIXME: Do some more effort than just MPI_ABORT.  */
>   MPI_Abort (MPI_COMM_WORLD, error);
> 
>   /* Should be unreachable, but to make sure also call exit.  */
>   exit (error);
> }
[...]

Thanks,
Ralf
Tobias Burnus March 2, 2011, 11:04 p.m. UTC | #2
Hello Ralf,

thanks for your comments.

Ralf Wildenhues wrote:
> The library will depend on the MPI installation, obviously.
> You can make a new target directory and overwrite CC with MPICC
> for that directory.  For choosing the latter, the AX_MPI macro from
> the Autoconf Macro Archive can help.  (I can help with the configure
> setup if needed, just not timely).

OK. Though as it should be a relatively small library with little 
dependency, I assume many users/system administrators will compile it 
separately.

[run the program]
> But maybe you don't need to take care of that but can leave it to the
> user of GCC.

That was my idea - though to have some way for the test suite would be 
useful ...

>> static MPI_Win caf_world_window;
> MPI_Win and one-sided communication is an MPI 2 feature; you need to
> test that since there are (IME) still implementations out there that
> don't fully support this.

The library part is actually not set in stone; there is a MPI 1 proposal 
for its implementation. [1]  However, using MPIv2 was the fastest for 
me. I could imagine that at the end there might be a couple of different 
library implementations. In any case, my current focus is mostly on the 
front end - thus a roughly working library is sufficient for me. Whether 
the polished library will be MPI 1 or MPI 2 or conditionally one or the 
other, the future will tell.

[1] http://gcc.gnu.org/ml/fortran/2010-04/msg00168.html

>>    MPI_Init (argc, argv);
> If you're relying on MPI2 anyway, I suggest you try to not limit your
> users to not use MPI themselves.  So please use MPI_Initialized, and
> don't finalize if the library isn't responsible.

Well, the current implementation relies on an early initialization by 
the front end, the compiler generates a main program, which initializes 
the library. Thus, there is no chance for the user to call MPI earlier. 
However, one should think of having a coarray program, where the main 
program is not a Fortran program. In that case, the parallelization 
library could be called later. However, one then runs into issues like 
initializing the library. I think one has to think carefully about that 
- but supporting coarrays with a non-Fortran main program causes several 
subtle issues.

Tobias

PS: I think I will come back to your offer to help with AX_MPI after the 
first front-end patch is in and the library has been a bit cleaned up.
Richard Biener March 3, 2011, 9:55 a.m. UTC | #3
2011/3/3 Tobias Burnus <burnus@net-b.de>:
> Hello Ralf,
>
> thanks for your comments.
>
> Ralf Wildenhues wrote:
>>
>> The library will depend on the MPI installation, obviously.
>> You can make a new target directory and overwrite CC with MPICC
>> for that directory.  For choosing the latter, the AX_MPI macro from
>> the Autoconf Macro Archive can help.  (I can help with the configure
>> setup if needed, just not timely).
>
> OK. Though as it should be a relatively small library with little
> dependency, I assume many users/system administrators will compile it
> separately.

You will definitely need to wrap the part of the MPI interface you are using.
I think unconditionally using a static library for this is probably best,
both for performance and binary deployment reasons (ok, HPC users
always link statically anyways ...).

That said, don't emit calls following the MPI API but instead use a
(possibly more optimization friendly) gfortran private API that wraps MPI.

Richard.
Ralf Wildenhues March 4, 2011, 4:57 p.m. UTC | #4
* Tobias Burnus wrote on Thu, Mar 03, 2011 at 12:04:12AM CET:
> Ralf Wildenhues wrote:
> >The library will depend on the MPI installation, obviously.
> >You can make a new target directory and overwrite CC with MPICC
> >for that directory.  For choosing the latter, the AX_MPI macro from
> >the Autoconf Macro Archive can help.  (I can help with the configure
> >setup if needed, just not timely).
> 
> OK. Though as it should be a relatively small library with little
> dependency, I assume many users/system administrators will compile
> it separately.

That should be even easier then.

> >>   MPI_Init (argc, argv);
> >If you're relying on MPI2 anyway, I suggest you try to not limit your
> >users to not use MPI themselves.  So please use MPI_Initialized, and
> >don't finalize if the library isn't responsible.
> 
> Well, the current implementation relies on an early initialization
> by the front end, the compiler generates a main program, which
> initializes the library. Thus, there is no chance for the user to
> call MPI earlier.

Ouch.  That then also means that the user will not be able to use MPI
herself in the program: for them, MPI_Init first thing in main is the
documented way to initialize MPI, but it will invoke undefined behavior
with above setup.

> However, one should think of having a coarray
> program, where the main program is not a Fortran program. In that
> case, the parallelization library could be called later. However,
> one then runs into issues like initializing the library. I think one
> has to think carefully about that - but supporting coarrays with a
> non-Fortran main program causes several subtle issues.

Could be; but I think you should aim for something like that in the end,
it would be significantly more user-friendly.

Thanks,
Ralf
Tobias Burnus March 4, 2011, 5:52 p.m. UTC | #5
On 03/04/2011 05:57 PM, Ralf Wildenhues wrote:
>>>>    MPI_Init (argc, argv);
>>> If you're relying on MPI2 anyway, I suggest you try to not limit your
>>> users to not use MPI themselves.  So please use MPI_Initialized, and
>>> don't finalize if the library isn't responsible.
>> Well, the current implementation relies on an early initialization
>> by the front end, the compiler generates a main program, which
>> initializes the library. Thus, there is no chance for the user to
>> call MPI earlier.
> Ouch.  That then also means that the user will not be able to use MPI
> herself in the program
> for them, MPI_Init first thing in main is the
> documented way to initialize MPI, but it will invoke undefined behavior
> with above setup.

Well, if I look in the yellow book also states that one should use 
MPI_Initialized to check whether it is already initialized. If they want 
to combine MPI with coarrays they have to be prepared that for coarrays 
also MPI might be used as backend communication library. That's always 
the problem as soon as one mixes parallelization schemes (OpenMP, 
pthread, MPI, GASNet, ARMCI/GA, shmem, ...) or different programming 
languages (C, UPC, C++, Ada, Pascal, go, ...): one needs to be careful, 
may need to add additional checks, and possibly some function calls to 
initialize libraries.

Besides, I do not see any possibility to prevent the issue, except using 
a different communication library than MPI. The MPI initialization can 
happen any time in the program, e.g. deep in some linked library - or it 
might never happen. Thus, it makes sense to initialize the coarray 
library early - which is before the actual user code starts.

Note further that the front end does not know which coarray 
communication library is linked. It can be the MPI one, it can be a stub 
one (effectively single image), or it could be pthread or some other 
library such as GASNet or ARMCI, which might sometimes also use MPI as 
backend.

However, in order to make it easier to use the library with a 
non-Fortran main program, I have added a MPI_Initialized check and I 
will later document in gfortran.texi how to initialize the library in 
that case.

>> However, one should think of having a coarray
>> program, where the main program is not a Fortran program. In that
>> case, the parallelization library could be called later. However,
>> one then runs into issues like initializing the library. I think one
>> has to think carefully about that - but supporting coarrays with a
>> non-Fortran main program causes several subtle issues.
> Could be; but I think you should aim for something like that in the end,
> it would be significantly more user-friendly.

I do not see any difference to the status quo, except for documenting 
the library function and for adding an MPI_Initialized. That won't be 
userfriendly but mixed-language programming is rarely user friendly.

Tobias

PS: Actually, one could delay the initialization of the library to the 
first use, but as one does not know when it happened and as an MPI_Init 
could come (hidden via a function call) at any point, one would 
effectively need to add before every communication a check whether the 
library is initialized, which would kill the performance and would still 
not help if the first coarray communication comes before the the 
MPI_Init ...
diff mbox

Patch

 gfortran.h        |    5 -
 intrinsic.c       |    3
 invoke.texi       |    6 +
 iresolve.c        |   10 ++-
 libgfortran.h     |    3
 options.c         |    2
 simplify.c        |    6 +
 trans-decl.c      |  118 +++++++++++++++++++++++++++++++++++++
 trans-intrinsic.c |   22 ++++++
 trans-stmt.c      |  172 +++++++++++++++++++++++++++++++++++++++++++++++-------
 trans.h           |   19 +++++
 11 files changed, 340 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index b64fa20..9a6907e 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -458,7 +458,7 @@  enum gfc_isym_id
   GFC_ISYM_NORM2,
   GFC_ISYM_NOT,
   GFC_ISYM_NULL,
-  GFC_ISYM_NUMIMAGES,
+  GFC_ISYM_NUM_IMAGES,
   GFC_ISYM_OR,
   GFC_ISYM_PACK,
   GFC_ISYM_PARITY,
@@ -572,7 +572,8 @@  init_local_integer;
 typedef enum
 {
   GFC_FCOARRAY_NONE = 0,
-  GFC_FCOARRAY_SINGLE
+  GFC_FCOARRAY_SINGLE,
+  GFC_FCOARRAY_LIB
 }
 gfc_fcoarray;
 
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 80dbaa8..0fea078 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -2358,7 +2358,8 @@  add_functions (void)
 
   make_generic ("null", GFC_ISYM_NULL, GFC_STD_F95);
 
-  add_sym_0 ("num_images", GFC_ISYM_NUMIMAGES, CLASS_INQUIRY, ACTUAL_NO, BT_INTEGER, di, GFC_STD_F2008,
+  add_sym_0 ("num_images", GFC_ISYM_NUM_IMAGES, CLASS_INQUIRY, ACTUAL_NO,
+	     BT_INTEGER, di, GFC_STD_F2008,
 	     NULL, gfc_simplify_num_images, NULL);
 
   add_sym_3 ("pack", GFC_ISYM_PACK, CLASS_TRANSFORMATIONAL, ACTUAL_NO, BT_REAL, dr, GFC_STD_F95,
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 291d1e7..f934015 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -166,7 +166,7 @@  and warnings}.
 -fwhole-file -fsecond-underscore @gol
 -fbounds-check -fcheck-array-temporaries  -fmax-array-constructor =@var{n} @gol
 -fcheck=@var{<all|array-temps|bounds|do|mem|pointer|recursion>} @gol
--fcoarray=@var{<none|single>} -fmax-stack-var-size=@var{n} @gol
+-fcoarray=@var{<none|single|lib>} -fmax-stack-var-size=@var{n} @gol
 -fpack-derived  -frepack-arrays  -fshort-enums  -fexternal-blas @gol
 -fblas-matmul-limit=@var{n} -frecursive -finit-local-zero @gol
 -finit-integer=@var{n} -finit-real=@var{<zero|inf|-inf|nan|snan>} @gol
@@ -1248,6 +1248,10 @@  statements will produce a compile-time error. (Default)
 
 @item @samp{single}
 Single-image mode, i.e. @code{num_images()} is always one.
+
+@item @samp{lib}
+Library-based coarray parallelization; a suitable GNU Fortran coarray
+library needs to be linked.
 @end table
 
 
diff --git a/gcc/fortran/iresolve.c b/gcc/fortran/iresolve.c
index d8309d2..f023544 100644
--- a/gcc/fortran/iresolve.c
+++ b/gcc/fortran/iresolve.c
@@ -2556,7 +2556,15 @@  gfc_resolve_image_index (gfc_expr *f, gfc_expr *array ATTRIBUTE_UNUSED,
 void
 gfc_resolve_this_image (gfc_expr *f, gfc_expr *array, gfc_expr *dim)
 {
-  resolve_bound (f, array, dim, NULL, "__this_image", true);
+  static char this_image[] = "__this_image";
+  if (array)
+    resolve_bound (f, array, dim, NULL, "__this_image", true);
+  else
+    {
+      f->ts.type = BT_INTEGER;
+      f->ts.kind = gfc_default_integer_kind;
+      f->value.function.name = this_image;
+    }
 }
 
 
diff --git a/gcc/fortran/libgfortran.h b/gcc/fortran/libgfortran.h
index 85a73d8..a87a057 100644
--- a/gcc/fortran/libgfortran.h
+++ b/gcc/fortran/libgfortran.h
@@ -98,12 +98,13 @@  typedef enum
 }
 libgfortran_error_codes;
 
+/* Must kept in sync with libgfortrancaf.h.  */
 typedef enum
 {
   GFC_STAT_UNLOCKED = 0,
   GFC_STAT_LOCKED,
   GFC_STAT_LOCKED_OTHER_IMAGE,
-  GFC_STAT_STOPPED_IMAGE
+  GFC_STAT_STOPPED_IMAGE /* See LIBERROR_INQUIRE_INTERNAL_UNIT above. */
 }
 libgfortran_stat_codes;
 
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index c116103..d81be4c 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -515,6 +515,8 @@  gfc_handle_coarray_option (const char *arg)
     gfc_option.coarray = GFC_FCOARRAY_NONE;
   else if (strcmp (arg, "single") == 0)
     gfc_option.coarray = GFC_FCOARRAY_SINGLE;
+  else if (strcmp (arg, "lib") == 0)
+    gfc_option.coarray = GFC_FCOARRAY_LIB;
   else
     gfc_fatal_error ("Argument to -fcoarray is not valid: %s", arg);
 }
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index bb8b575..69edad8 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4591,6 +4591,9 @@  gfc_simplify_num_images (void)
       return &gfc_bad_expr;
     }
 
+  if (gfc_option.coarray != GFC_FCOARRAY_SINGLE)
+    return NULL;
+
   /* FIXME: gfc_current_locus is wrong.  */
   result = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
 				  &gfc_current_locus);
@@ -6313,6 +6316,9 @@  gfc_simplify_this_image (gfc_expr *coarray, gfc_expr *dim)
   gfc_array_spec *as;
   int d;
 
+  if (gfc_option.coarray != GFC_FCOARRAY_SINGLE)
+    return NULL;
+
   if (coarray == NULL)
     {
       gfc_expr *result;
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 08207e0..3e63de5 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -111,6 +111,22 @@  tree gfor_fndecl_in_unpack;
 tree gfor_fndecl_associated;
 
 
+/* Coarray run-time library function decls.  */
+tree gfor_fndecl_caf_init;
+tree gfor_fndecl_caf_finalize;
+tree gfor_fndecl_caf_critical;
+tree gfor_fndecl_caf_end_critical;
+tree gfor_fndecl_caf_sync_all;
+tree gfor_fndecl_caf_sync_images;
+tree gfor_fndecl_caf_error_stop;
+tree gfor_fndecl_caf_error_stop_str;
+
+/* Coarray global variables for num_images/this_image.  */
+
+tree gfort_gvar_caf_num_images;
+tree gfort_gvar_caf_this_image;
+
+
 /* Math functions.  Many other math functions are handled in
    trans-intrinsic.c.  */
 
@@ -3003,6 +3019,50 @@  gfc_build_builtin_function_decls (void)
   DECL_PURE_P (gfor_fndecl_associated) = 1;
   TREE_NOTHROW (gfor_fndecl_associated) = 1;
 
+  /* Coarray library calls.  */
+  if (gfc_option.coarray == GFC_FCOARRAY_LIB)
+    {
+      tree pint_type, pppchar_type;
+
+      pint_type = build_pointer_type (integer_type_node);
+      pppchar_type
+	= build_pointer_type (build_pointer_type (pchar_type_node));
+
+      gfor_fndecl_caf_init = gfc_build_library_function_decl (
+		   get_identifier (PREFIX("caf_init")),  void_type_node,
+		   4, pint_type, pppchar_type, pint_type, pint_type);
+
+      gfor_fndecl_caf_finalize = gfc_build_library_function_decl (
+	get_identifier (PREFIX("caf_finalize")), void_type_node, 0);
+
+      gfor_fndecl_caf_critical = gfc_build_library_function_decl (
+	get_identifier (PREFIX("caf_critical")), void_type_node, 0);
+
+      gfor_fndecl_caf_end_critical = gfc_build_library_function_decl (
+	get_identifier (PREFIX("caf_end_critical")), void_type_node, 0);
+
+      gfor_fndecl_caf_sync_all = gfc_build_library_function_decl_with_spec (
+	get_identifier (PREFIX("caf_sync_all")), ".W", integer_type_node,
+	2, build_pointer_type (pchar_type_node), integer_type_node);
+
+      gfor_fndecl_caf_sync_images = gfc_build_library_function_decl_with_spec (
+	get_identifier (PREFIX("caf_sync_images")), ".RRW", integer_type_node,
+	4, integer_type_node, pint_type, build_pointer_type (pchar_type_node),
+	integer_type_node);
+
+      gfor_fndecl_caf_error_stop = gfc_build_library_function_decl (
+	get_identifier (PREFIX("caf_error_stop")),
+	void_type_node, 1, gfc_int4_type_node);
+      /* CAF's ERROR STOP doesn't return.  */
+      TREE_THIS_VOLATILE (gfor_fndecl_caf_error_stop) = 1;
+
+      gfor_fndecl_caf_error_stop_str = gfc_build_library_function_decl_with_spec (
+	get_identifier (PREFIX("caf_error_stop_str")), ".R.",
+	void_type_node, 2, pchar_type_node, gfc_int4_type_node);
+      /* CAF's ERROR STOP doesn't return.  */
+      TREE_THIS_VOLATILE (gfor_fndecl_caf_error_stop_str) = 1;
+    }
+
   gfc_build_intrinsic_function_decls ();
   gfc_build_intrinsic_lib_fndecls ();
   gfc_build_io_library_fndecls ();
@@ -4405,6 +4465,40 @@  add_argument_checking (stmtblock_t *block, gfc_symbol *sym)
 }
 
 
+void
+gfc_init_coarray_decl (void)
+{
+  tree save_fn_decl = current_function_decl;
+
+  if (gfc_option.coarray != GFC_FCOARRAY_LIB)
+    return;
+
+  if (gfort_gvar_caf_this_image || gfort_gvar_caf_num_images)
+    return;
+
+  save_fn_decl = current_function_decl;
+  current_function_decl = NULL_TREE;
+  push_cfun (cfun);
+
+  gfort_gvar_caf_this_image = gfc_create_var (integer_type_node,
+					      PREFIX("caf_this_image"));
+  DECL_ARTIFICIAL (gfort_gvar_caf_this_image) = 1;
+  TREE_USED (gfort_gvar_caf_this_image) = 1;
+  TREE_PUBLIC (gfort_gvar_caf_this_image) = 1;
+  TREE_STATIC (gfort_gvar_caf_this_image) = 1;
+
+  gfort_gvar_caf_num_images = gfc_create_var (integer_type_node,
+					      PREFIX("caf_num_images"));
+  DECL_ARTIFICIAL (gfort_gvar_caf_num_images) = 1;
+  TREE_USED (gfort_gvar_caf_num_images) = 1;
+  TREE_PUBLIC (gfort_gvar_caf_num_images) = 1;
+  TREE_STATIC (gfort_gvar_caf_num_images) = 1;
+
+  pop_cfun ();
+  current_function_decl = save_fn_decl;
+}
+
+
 static void
 create_main_function (tree fndecl)
 {
@@ -4484,6 +4578,23 @@  create_main_function (tree fndecl)
 
   /* Call some libgfortran initialization routines, call then MAIN__(). */
 
+  /* Call _gfortran_caf_init (*argc, ***argv, *this_image, *num_images).  */
+  if (gfc_option.coarray == GFC_FCOARRAY_LIB)
+    {
+      tree pint_type, pppchar_type;
+      pint_type = build_pointer_type (integer_type_node);
+      pppchar_type
+	= build_pointer_type (build_pointer_type (pchar_type_node));
+
+      gfc_init_coarray_decl ();
+      tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_init, 4,
+		gfc_build_addr_expr (pint_type, argc),
+		gfc_build_addr_expr (pppchar_type, argv),
+		gfc_build_addr_expr (pint_type, gfort_gvar_caf_this_image),
+		gfc_build_addr_expr (pint_type, gfort_gvar_caf_num_images));
+      gfc_add_expr_to_block (&body, tmp);
+    }
+
   /* Call _gfortran_set_args (argc, argv).  */
   TREE_USED (argc) = 1;
   TREE_USED (argv) = 1;
@@ -4601,6 +4712,13 @@  create_main_function (tree fndecl)
   /* Mark MAIN__ as used.  */
   TREE_USED (fndecl) = 1;
 
+  /* Coarray: Call _gfortran_caf_finalize(void).  */
+  if (gfc_option.coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_finalize, 0);
+      gfc_add_expr_to_block (&body, tmp);
+    }
+
   /* "return 0".  */
   tmp = fold_build2_loc (input_location, MODIFY_EXPR, integer_type_node,
 			 DECL_RESULT (ftn_main),
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 502a815..100620d 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -918,6 +918,20 @@  gfc_conv_intrinsic_exponent (gfc_se *se, gfc_expr *expr)
   se->expr = fold_convert (type, res);
 }
 
+static void
+trans_this_image (gfc_se * se, gfc_expr *expr ATTRIBUTE_UNUSED)
+{
+  gfc_init_coarray_decl ();
+  se->expr = gfort_gvar_caf_this_image;
+}
+
+static void
+trans_num_images (gfc_se * se)
+{
+  gfc_init_coarray_decl ();
+  se->expr = gfort_gvar_caf_num_images;
+}
+
 /* Evaluate a single upper or lower bound.  */
 /* TODO: bound intrinsic generates way too much unnecessary code.  */
 
@@ -6111,6 +6125,14 @@  gfc_conv_intrinsic_function (gfc_se * se, gfc_expr * expr)
       gfc_conv_intrinsic_loc (se, expr);
       break;
 
+    case GFC_ISYM_THIS_IMAGE:
+      trans_this_image (se, expr);
+      break;
+
+    case GFC_ISYM_NUM_IMAGES:
+      trans_num_images (se);
+      break;
+
     case GFC_ISYM_ACCESS:
     case GFC_ISYM_CHDIR:
     case GFC_ISYM_CHMOD:
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 98fb74c..4de490c8 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -599,11 +599,20 @@  gfc_trans_stop (gfc_code *code, bool error_stop)
   gfc_init_se (&se, NULL);
   gfc_start_block (&se.pre);
 
+  if (gfc_option.coarray == GFC_FCOARRAY_LIB && !error_stop)
+    {
+      tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_finalize, 0);
+      gfc_add_expr_to_block (&se.pre, tmp);
+    }
+
   if (code->expr1 == NULL)
     {
       tmp = build_int_cst (gfc_int4_type_node, 0);
       tmp = build_call_expr_loc (input_location,
-				 error_stop ? gfor_fndecl_error_stop_string
+				 error_stop
+				 ? (gfc_option.coarray == GFC_FCOARRAY_LIB
+				    ? gfor_fndecl_caf_error_stop_str
+				    : gfor_fndecl_error_stop_string)
 				 : gfor_fndecl_stop_string,
 				 2, build_int_cst (pchar_type_node, 0), tmp);
     }
@@ -611,7 +620,10 @@  gfc_trans_stop (gfc_code *code, bool error_stop)
     {
       gfc_conv_expr (&se, code->expr1);
       tmp = build_call_expr_loc (input_location,
-				 error_stop ? gfor_fndecl_error_stop_numeric
+				 error_stop
+				 ? (gfc_option.coarray == GFC_FCOARRAY_LIB
+				    ? gfor_fndecl_caf_error_stop
+				    : gfor_fndecl_error_stop_numeric)
 				 : gfor_fndecl_stop_numeric_f08, 1, 
 				 fold_convert (gfc_int4_type_node, se.expr));
     }
@@ -619,7 +631,10 @@  gfc_trans_stop (gfc_code *code, bool error_stop)
     {
       gfc_conv_expr_reference (&se, code->expr1);
       tmp = build_call_expr_loc (input_location,
-				 error_stop ? gfor_fndecl_error_stop_string
+				 error_stop
+				 ? (gfc_option.coarray == GFC_FCOARRAY_LIB
+				    ? gfor_fndecl_caf_error_stop_str
+				    : gfor_fndecl_error_stop_string)
 				 : gfor_fndecl_stop_string,
 				 2, se.expr, se.string_length);
     }
@@ -633,14 +648,55 @@  gfc_trans_stop (gfc_code *code, bool error_stop)
 
 
 tree
-gfc_trans_sync (gfc_code *code, gfc_exec_op type __attribute__ ((unused)))
+gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 {
-  gfc_se se;
+  gfc_se se, argse;
+  tree tmp;
+  tree images = NULL_TREE, stat = NULL_TREE,
+       errmsg = NULL_TREE, errmsglen = NULL_TREE;
+
+  /* Short cut: For single images without bound checking or without STAT=,
+     return early. (ERRMSG= is always untouched for -fcoarray=single.)  */
+  if (!code->expr1 && !(gfc_option.rtcheck & GFC_RTCHECK_BOUNDS)
+      && gfc_option.coarray != GFC_FCOARRAY_LIB)
+    return NULL_TREE; 
 
-  if ((code->expr1 && (gfc_option.rtcheck & GFC_RTCHECK_BOUNDS)) || code->expr2)
+  gfc_init_se (&se, NULL);
+  gfc_start_block (&se.pre);
+
+  if (code->expr1 && code->expr1->rank == 0)
     {
-      gfc_init_se (&se, NULL);
-      gfc_start_block (&se.pre);
+      gfc_init_se (&argse, NULL);
+      gfc_conv_expr_val (&argse, code->expr1);
+      images = argse.expr;
+    }
+
+  /* FIXME: Handle the case that STAT= or ERRMSG= variable
+     is an absent dummy argument. ERRMSG might be OK, but
+     LOCAL STAT might have an issue. */
+
+  if (code->expr2)
+    {
+      gcc_assert (code->expr2->expr_type == EXPR_VARIABLE);
+      gfc_init_se (&argse, NULL);
+      gfc_conv_expr_val (&argse, code->expr2);
+      stat = argse.expr;
+    }
+
+  if (code->expr3 && gfc_option.coarray == GFC_FCOARRAY_LIB
+      && type != EXEC_SYNC_MEMORY)
+    {
+      gcc_assert (code->expr3->expr_type == EXPR_VARIABLE);
+      gfc_init_se (&argse, NULL);
+      gfc_conv_expr (&argse, code->expr3);
+      gfc_conv_string_parameter (&argse);
+      errmsg = argse.expr;
+      errmsglen = argse.string_length;
+    }
+  else if (gfc_option.coarray == GFC_FCOARRAY_LIB && type != EXEC_SYNC_MEMORY)
+    {
+      errmsg = null_pointer_node;
+      errmsglen = build_int_cst (integer_type_node, 0);
     }
 
   /* Check SYNC IMAGES(imageset) for valid image index.
@@ -649,27 +705,88 @@  gfc_trans_sync (gfc_code *code, gfc_exec_op type __attribute__ ((unused)))
       && code->expr1->rank == 0)
     {
       tree cond;
-      gfc_conv_expr (&se, code->expr1);
-      cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node,
-			      se.expr, build_int_cst (TREE_TYPE (se.expr), 1));
+      if (gfc_option.coarray != GFC_FCOARRAY_LIB)
+	cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node,
+				images, build_int_cst (TREE_TYPE (images), 1));
+      else
+	{
+	  tree cond2;
+	  cond = fold_build2_loc (input_location, GE_EXPR, boolean_type_node,
+				  images, gfort_gvar_caf_num_images);
+	  cond2 = fold_build2_loc (input_location, LT_EXPR, boolean_type_node,
+				   images,
+				   build_int_cst (TREE_TYPE (images), 1));
+	  cond = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+				  boolean_type_node, cond, cond2);
+	}
       gfc_trans_runtime_check (true, false, cond, &se.pre,
 			       &code->expr1->where, "Invalid image number "
 			       "%d in SYNC IMAGES",
 			       fold_convert (integer_type_node, se.expr));
     }
 
-  /* If STAT is present, set it to zero.  */
-  if (code->expr2)
+  if (gfc_option.coarray != GFC_FCOARRAY_LIB || type == EXEC_SYNC_MEMORY)
     {
-      gcc_assert (code->expr2->expr_type == EXPR_VARIABLE);
-      gfc_conv_expr (&se, code->expr2);
-      gfc_add_modify (&se.pre, se.expr, build_int_cst (TREE_TYPE (se.expr), 0));
+       if (gfc_option.coarray == GFC_FCOARRAY_LIB)
+	 {
+	   tmp = built_in_decls [BUILT_IN_SYNCHRONIZE];
+	   tmp = build_call_expr_loc (input_location, tmp, 0);
+	   gfc_add_expr_to_block (&se.pre, tmp);
+	 }
+
+      /* Set STAT to zero.  */
+      if (code->expr2)
+	gfc_add_modify (&se.pre, stat, build_int_cst (TREE_TYPE (stat), 0));
+    }
+  else if (type == EXEC_SYNC_ALL)
+    {
+      tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_sync_all,
+				 2, errmsg, errmsglen);
+      if (code->expr2)
+	gfc_add_modify (&se.pre, stat, fold_convert (TREE_TYPE (stat), tmp));
+      else
+	gfc_add_expr_to_block (&se.pre, tmp);
     }
+  else
+    {
+      tree len;
+
+      gcc_assert (type == EXEC_SYNC_IMAGES);
 
-  if ((code->expr1 && (gfc_option.rtcheck & GFC_RTCHECK_BOUNDS)) || code->expr2)
-    return gfc_finish_block (&se.pre);
- 
-  return NULL_TREE;
+      if (!code->expr1)
+	len = build_int_cst (integer_type_node, -1);
+      else if (code->expr1->rank == 0)
+	{
+	  len = build_int_cst (integer_type_node, 1);
+	  images = gfc_build_addr_expr (NULL_TREE, images);
+	}
+      else
+	{
+	  gfc_conv_array_parameter (&se, code->expr1,
+				    gfc_walk_expr (code->expr1), true, NULL,
+				    NULL, &len);
+	  images = se.expr;
+
+	  tmp = gfc_typenode_for_spec (&code->expr1->ts);
+	  if (GFC_ARRAY_TYPE_P (tmp) || GFC_DESCRIPTOR_TYPE_P (tmp))
+	    tmp = gfc_get_element_type (tmp);
+
+	  len = fold_build2_loc (input_location, TRUNC_DIV_EXPR,
+				 integer_type_node, len,
+				 fold_convert (gfc_array_index_type,
+					       TYPE_SIZE_UNIT (tmp)));
+	}
+
+      tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_sync_images, 4,
+				 fold_convert (integer_type_node, len), images,
+				 errmsg, errmsglen);
+      if (code->expr2)
+	gfc_add_modify (&se.pre, stat, fold_convert (TREE_TYPE (stat), tmp));
+      else
+	gfc_add_expr_to_block (&se.pre, tmp);
+    }
+
+  return gfc_finish_block (&se.pre);
 }
 
 
@@ -870,9 +987,24 @@  gfc_trans_critical (gfc_code *code)
   tree tmp;
 
   gfc_start_block (&block);
+
+  if (gfc_option.coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_critical, 0);
+      gfc_add_expr_to_block (&block, tmp);
+    }
+
   tmp = gfc_trans_code (code->block->next);
   gfc_add_expr_to_block (&block, tmp);
 
+  if (gfc_option.coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_end_critical,
+				 0);
+      gfc_add_expr_to_block (&block, tmp);
+    }
+
+
   return gfc_finish_block (&block);
 }
 
diff --git a/gcc/fortran/trans.h b/gcc/fortran/trans.h
index 1536f2e..997259f 100644
--- a/gcc/fortran/trans.h
+++ b/gcc/fortran/trans.h
@@ -452,6 +452,9 @@  bool gfc_get_module_backend_decl (gfc_symbol *);
 /* Return the variable decl for a symbol.  */
 tree gfc_get_symbol_decl (gfc_symbol *);
 
+/* Initialize coarray global variables.  */
+void gfc_init_coarray_decl (void);
+
 /* Build a static initializer.  */
 tree gfc_conv_initializer (gfc_expr *, gfc_typespec *, tree, bool, bool, bool);
 
@@ -613,6 +616,22 @@  extern GTY(()) tree gfor_fndecl_in_pack;
 extern GTY(()) tree gfor_fndecl_in_unpack;
 extern GTY(()) tree gfor_fndecl_associated;
 
+
+/* Coarray run-time library function decls.  */
+extern GTY(()) tree gfor_fndecl_caf_init;
+extern GTY(()) tree gfor_fndecl_caf_finalize;
+extern GTY(()) tree gfor_fndecl_caf_critical;
+extern GTY(()) tree gfor_fndecl_caf_end_critical;
+extern GTY(()) tree gfor_fndecl_caf_sync_all;
+extern GTY(()) tree gfor_fndecl_caf_sync_images;
+extern GTY(()) tree gfor_fndecl_caf_error_stop;
+extern GTY(()) tree gfor_fndecl_caf_error_stop_str;
+
+/* Coarray global variables for num_images/this_image.  */
+extern GTY(()) tree gfort_gvar_caf_num_images;
+extern GTY(()) tree gfort_gvar_caf_this_image;
+
+
 /* Math functions.  Many other math functions are handled in
    trans-intrinsic.c.  */