diff mbox

RFC: Merge the GUPC branch into the GCC 6.0 trunk

Message ID 20151201053129.GA29501@intrepid.com
State New
Headers show

Commit Message

Gary Funck Dec. 1, 2015, 5:31 a.m. UTC
Some time ago, we submitted an RFC for the introduction of
UPC support into GCC.  During the intervening time period,
we have continued to keep the 'gupc' (GNU UPC) branch in sync
with the GCC trunk and have incorporated feedback and contributions from
various GCC developers (Joseph Myers, Tom Tromey, Jakub Jelinek,
Richard Henderson, Meador Inge, and others).  We have also implemented
various bug fixes and improvements.

At this time, we would like to re-submit the UPC patches for comment
with the goal of introducing these changes into GCC 6.0.

This email provides an overview of UPC and summarizes the
impact of UPC changes on the GCC front-end.

Subsequent emails will include various patch sets which are grouped
by the area of GCC that they impact (front-end, generic, documentation,
build, test, target-specific, and so on), so that they can receive
a more focused review by their respective maintainers.

The main review-related changes are:

* GUPC is no longer implemented as a separate language
(e.g., Objective-C or C++) compiler.  Rather, a new -fupc switch
has been added, which enables UPC support in the C compiler.

* The UPC blocking factor now only uses two of the tree's
"spare" bits.  If the UPC blocking factor is not the default
value of 1 or the "indefinite" value of 0, then it is recorded
in a separate hash table, indexed by the tree node.

* UPC-specific tree support has been integrated into
gcc/c-family/c-common.def and gcc/c-family/c-common.h.

* The number of UPC-specific configuration options
have been reduced.

* The UPC pointer-to-shared format per-target configuration
has been simplified.  Before, both a "packed" and a "struct"
pointer-to-shared representation was supported.  Now, only
the "struct" format is supported and various configuration
options for tweaking field sizes and such have been removed.

* In keeping with current GCC development guidelines
target macros are no longer used.  Rather, where needed,
target hooks are defined and used.

* FIXME's and TODO's were either fixed or cleaned up.

* The copyright and license notices were updated.

* The code was reviewed for conformance to coding standards and updated.

* Diagnostics now use appropriate format strings rather than building
up the strings with sprintf().

* Files in c-family/ no longer include c-tree.h to conform with modularization
improvements.

* Most of the #ifdef conditionals have been removed.  Some target hooks
have been defined and documented in tm.texi.

* The code was reviewed to verify that it conforms with
current GCC coding practices and that it incorporates cleanups
done in the past several years.

* Comments were added to most new functions, and typos and
spelling errors in comments were fixed.

* Changes that appeared in the diff's that were unrelated to UPC
were removed or incorporated into the trunk.

* The linkage to the libgupc library was changed to use the newly
defined method (used in libgomp/libgo for example) of including
library 'spec' files.  This led to a simplification where we no
longer needed to add UPC-specific spec. files in various
target-specific config. directories.

Introduction: UPC-related Changes
---------------------------------

Below, various UPC-related changes are summarized.
This introduction is provided as background for review of the UPC
changes implemented in the GUPC branch.  Each individual change will be
discussed in more detail in the patch sets found in the following emails.

The current GUPC branch is based upon a recent version of the GCC trunk
and has been bootstrapped on x86_64/i686 Linux, x86_64
Darwin, IA64/Altix Linux, PowerPC Power7 (big endian), and Power8
(little endian).  Also some testing has been done on various flavors
of BSD and Solaris and in the past MIPS was tested and supported.

All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
bootstrapped; no test suite regressions were introduced,
relative to the GCC trunk.

The GUPC branch is described here:
  http://gcc.gnu.org/projects/gupc.html

The UPC-related source code differences are summarized here:
  http://gccupc.org/gupc-changes

In the discussion below, some changes are excerpted in order to
highlight important aspects of the changes.

UPC's Shared Qualifier and Layout Qualifier
-------------------------------------------

The UPC language specification describes
the language syntax and semantics:
  http://upc.lbl.gov/publications/upc-spec-1.3.pdf

UPC introduces a new qualifier, "shared" that indicates that the
qualified object is located in a global shared address space that is
accessible by all UPC threads.  Additional qualifiers ("strict" and
"relaxed") further specify the semantics of accesses to
UPC shared objects.

In UPC, a shared qualified array can optionally specify a "layout
qualifier" that indicates how the shared data is blocked and
distributed across UPC threads.

There are two language pre-defined identifiers that indicate the
number of threads that will be created when the program starts
(THREADS) and the current (zero-based) thread number (MYTHREAD).
Typically, a UPC thread is implemented as an operating system process,
though they may be mapped to pthreads, when compiled with the
-fupc-pthreads-model-tls switch.

Access to UPC shared memory may be implemented locally via OS provided
facilities (for example, mmap), or across nodes via a high speed
network inter-connect (for example, Infiniband).

GUPC provides a runtime (libgupc) that targets an SMP-based system
that uses mmap() to implement global shared memory.

Optionally, GUPC can use the more general and more capable Berkeley
UPCR runtime:
  http://upc.lbl.gov/download/source.shtml#runtime
The UPCR runtime supports a number of network
topologies, and has been ported to most of the
current High Performance Computing (HPC) systems.

The following example illustrates the use of the UPC "shared" qualifier
combined with a layout qualifier.

    #define BLKSIZE 5
    #define N_PER_THREAD (4 * BLKSIZE)
    shared [BLKSIZE] double A[N_PER_THREAD*THREADS];

Above the "[BLKSIZE]" construct is the UPC layout factor; this
specifies that the shared array, A, distributes its elements across
each thread in blocks of 5 elements.  If the program is run with two
threads, then A is distributed as shown below:

    Thread 0	Thread 1
    --------	---------
    A[ 0.. 4]	A[ 5.. 9]
    A[10..14]	A[15..19]
    A[20..24]	A[25..29]
    A[30..34]	A[35..39]

The elements shown for thread 0 are defined as having "affinity"
to thread 0.  Similarly, those elements shown for thread 1 have
affinity to thread 1.  In UPC, a pointer to a shared object can be
cast to a thread local pointer (a "C" pointer), when the designated
shared object has affinity to the referencing thread.

A UPC "pointer-to-shared" (PTS) is a pointer that references a UPC
shared object.  A UPC pointer-to-shared is a "fat" pointer with the
following logical fields:
   (virt_addr, thread, phase)

The virtual address (virt_addr) field is combined with the thread
number (thread) to derive the location of the referenced object
within the UPC shared address space.  The phase field is used
keep track of the current block offset for PTS's that have
blocking factor that is greater than one.

GUPC implements pointer-to-shared objects using a "struct" representation.
Until recently, GUPC also supported a "packed" representation, which
is more space efficient, but limits the range of various fields
in the UPC pointer-to-shared representation.  We have decided to
support only the "struct" representation so that the compiler uses
a single ABI that supports the full range of addresses, threads,
and blocking factors.

GCC's internal tree representation is extended to record the UPC
"shared", "strict", "relaxed" qualifiers, and the layout qualifier.

   { "inout",           RID_INOUT,              D_OBJC },
   { "oneway",          RID_ONEWAY,             D_OBJC },
   { "out",             RID_OUT,                D_OBJC },
+
+  /* UPC keywords */
+  { "shared",          RID_SHARED,             D_UPC },
+  { "relaxed",         RID_RELAXED,            D_UPC },
+  { "strict",          RID_STRICT,             D_UPC },
+  { "upc_barrier",     RID_UPC_BARRIER,        D_UPC },
+  { "upc_blocksizeof", RID_UPC_BLOCKSIZEOF,    D_UPC },
+  { "upc_elemsizeof",  RID_UPC_ELEMSIZEOF,     D_UPC },
+  { "upc_forall",      RID_UPC_FORALL,         D_UPC },
+  { "upc_localsizeof", RID_UPC_LOCALSIZEOF,    D_UPC },
+  { "upc_notify",      RID_UPC_NOTIFY,         D_UPC },
+  { "upc_wait",                RID_UPC_WAIT,           D_UPC },
+

--- gcc/c/c-parser.c    (.../trunk)     (revision 228959)
+++ gcc/c/c-parser.c    (.../branches/gupc)     (revision 229159)
[...]
+/* These UPC parser functions are only ever called when
+   compiling UPC.  */
+static void c_parser_upc_forall_statement (c_parser *);
+static void c_parser_upc_sync_statement (c_parser *, int);
+static void c_parser_upc_shared_qual (source_location,
+                                      c_parser *,
+                                     struct c_declspecs *);
+
[...]
+        /* UPC qualifiers */
+       case RID_SHARED:
+         attrs_ok = true;
+         c_parser_upc_shared_qual (loc, parser, specs);
+         break;
+       case RID_STRICT:
+       case RID_RELAXED:
+         attrs_ok = true;
+         declspecs_add_qual (loc, specs, c_parser_peek_token (parser)->value);
+         c_parser_consume_token (parser);
+         break;
[...]
+  /* Process all #pragma's just after the opening brace.  This
+     handles #pragma upc, which can only appear just after
+     the opening brace, when it appears within a function body.  */
+  push_upc_consistency_mode ();
+  permit_pragma_upc ();
+  while (c_parser_next_token_is (parser, CPP_PRAGMA))
+    {
+      location_t loc ATTRIBUTE_UNUSED = c_parser_peek_token
(parser)->location;
+      if (c_parser_pragma (parser, pragma_compound))
+        last_label = false, last_stmt = true;
+      parser->error = false;
+    }
+  deny_pragma_upc ();
[...]
+       case RID_UPC_FORALL:
+          gcc_assert (flag_upc);
+         c_parser_upc_forall_statement (parser);
+         break;
+        case RID_UPC_NOTIFY:
+          gcc_assert (flag_upc);
+         c_parser_upc_sync_statement (parser, UPC_SYNC_NOTIFY_OP);
+         goto expect_semicolon;
+        case RID_UPC_WAIT:
+          gcc_assert (flag_upc);
+         c_parser_upc_sync_statement (parser, UPC_SYNC_WAIT_OP);
+         goto expect_semicolon;
+        case RID_UPC_BARRIER:
+          gcc_assert (flag_upc);
+         c_parser_upc_sync_statement (parser, UPC_SYNC_BARRIER_OP);
+         goto expect_semicolon;
[...]
        case RID_SIZEOF:
          return c_parser_sizeof_expression (parser);
+       case RID_UPC_BLOCKSIZEOF:
+       case RID_UPC_ELEMSIZEOF:
+       case RID_UPC_LOCALSIZEOF:
+          gcc_assert (flag_upc);
+         return c_parser_sizeof_expression (parser);
[...]

--- gcc/c-family/c-pragma.c     (.../trunk)     (revision 228959)
+++ gcc/c-family/c-pragma.c     (.../branches/gupc)     (revision 229159)
[...]
+/*
+ *  #pragma upc strict
+ *  #pragma upc relaxed
+ *  #pragma upc upc_code
+ *  #pragma upc c_code
+ */
+static void
+handle_pragma_upc (cpp_reader * ARG_UNUSED (dummy))
+{
[...]

c-decl.c handles the additional UPC qualifiers and declspecs.
The layout qualifier is handled here:

--- gcc/c/c-decl.c      (.../trunk)     (revision 228959)
+++ gcc/c/c-decl.c      (.../branches/gupc)     (revision 229159)
[...]
+  /* A UPC layout qualifier is encoded as an ARRAY_REF,
+     further, it implies the presence of the 'shared' keyword. */
+  if (TREE_CODE (qual) == ARRAY_REF)
+    {
+      if (specs->upc_layout_qualifier)
+        {
+          error ("two or more layout qualifiers specified");
+          return specs;
+        }
+      else
+        {
+          specs->upc_layout_qualifier = qual;
+          qual = ridpointers[RID_SHARED];
+        }
+    }

In UPC, a qualifier includes both the traditional
"C" qualifier flags and the UPC "layout qualifier".
Thus, the pointer_quals field of a declarator node
is defined as a struct including both qualifier
flags and the UPC type qualifier, as shown below.

            /* Process type qualifiers (such as const or volatile)
               that were given inside the `*'.  */
-           type_quals = declarator->u.pointer_quals;
+           type_quals = declarator->u.pointer.quals;
+           upc_layout_qualifier = declarator->u.pointer.upc_layout_qual;
+           sharedp = ((type_quals & TYPE_QUAL_SHARED) != 0);

UPC shared variables are allocated at runtime in the global memory
that is allocated and managed by the UPC runtime.  A separate link
section is used as a method of assigning virtual addresses to UPC
shared variables.  The UPC shared variable section is designated as a
"no load" section on systems that support that facility; in that case,
the linkage section begins at virtual address zero.  The logic below
assigns UPC shared variables to their own linkage section.

+    /* Shared variables are given their own link section on
+       most target platforms, and if compiling in pthreads mode
+       regular local file scope variables are made thread local. */
+    if ((TREE_CODE(decl) == VAR_DECL)
+        && !threadp && (TREE_SHARED (decl) || flag_upc_pthreads))
+      upc_set_decl_section (decl);
+

Patches
-------

The patches are organized into the following categories
and will be sent out as separate email messages.

[UPC 01/22] front-end changes
[UPC 02/22] tree-related changes
[UPC 03/22] options processing, driver
[UPC 04/22] Make, Config changes
[UPC 05/22] language hooks changes
[UPC 06/22] target hooks
[UPC 07/22] lowering, pointer-to-shared ops
[UPC 08/22] target - Darwin
[UPC 09/22] target - x86
[UPC 10/22] target - rs6000
[UPC 11/22] documentation
[UPC 12/22] DWARF support
[UPC 13/22] C++ changes
[UPC 14/22] constant folding changes
[UPC 15/22] RTL changes
[UPC 16/22] gimple/gimplify changes
[UPC 17/22] misc/common changes
[UPC 18/22] libatomic changes
[UPC 19/22] libgupc - Make, Configure
[UPC 20/22] libgupc runtime library
[UPC 21/22] gcc.dg test suite
[UPC 22/22] libgupc test suite

thanks,
- Gary

Comments

Richard Biener Dec. 1, 2015, 11:12 a.m. UTC | #1
On Mon, 30 Nov 2015, Gary Funck wrote:

> 
> Some time ago, we submitted an RFC for the introduction of
> UPC support into GCC.  During the intervening time period,
> we have continued to keep the 'gupc' (GNU UPC) branch in sync
> with the GCC trunk and have incorporated feedback and contributions from
> various GCC developers (Joseph Myers, Tom Tromey, Jakub Jelinek,
> Richard Henderson, Meador Inge, and others).  We have also implemented
> various bug fixes and improvements.
> 
> At this time, we would like to re-submit the UPC patches for comment
> with the goal of introducing these changes into GCC 6.0.

First of all let me say that it is IMNSHO now too late for GCC 6.

> This email provides an overview of UPC and summarizes the
> impact of UPC changes on the GCC front-end.
> 
> Subsequent emails will include various patch sets which are grouped
> by the area of GCC that they impact (front-end, generic, documentation,
> build, test, target-specific, and so on), so that they can receive
> a more focused review by their respective maintainers.
> 
> The main review-related changes are:
> 
> * GUPC is no longer implemented as a separate language
> (e.g., Objective-C or C++) compiler.  Rather, a new -fupc switch
> has been added, which enables UPC support in the C compiler.
> 
> * The UPC blocking factor now only uses two of the tree's
> "spare" bits.  If the UPC blocking factor is not the default
> value of 1 or the "indefinite" value of 0, then it is recorded
> in a separate hash table, indexed by the tree node.
> 
> * UPC-specific tree support has been integrated into
> gcc/c-family/c-common.def and gcc/c-family/c-common.h.
> 
> * The number of UPC-specific configuration options
> have been reduced.
> 
> * The UPC pointer-to-shared format per-target configuration
> has been simplified.  Before, both a "packed" and a "struct"
> pointer-to-shared representation was supported.  Now, only
> the "struct" format is supported and various configuration
> options for tweaking field sizes and such have been removed.
> 
> * In keeping with current GCC development guidelines
> target macros are no longer used.  Rather, where needed,
> target hooks are defined and used.
> 
> * FIXME's and TODO's were either fixed or cleaned up.
> 
> * The copyright and license notices were updated.
> 
> * The code was reviewed for conformance to coding standards and updated.
> 
> * Diagnostics now use appropriate format strings rather than building
> up the strings with sprintf().
> 
> * Files in c-family/ no longer include c-tree.h to conform with modularization
> improvements.
> 
> * Most of the #ifdef conditionals have been removed.  Some target hooks
> have been defined and documented in tm.texi.
> 
> * The code was reviewed to verify that it conforms with
> current GCC coding practices and that it incorporates cleanups
> done in the past several years.
> 
> * Comments were added to most new functions, and typos and
> spelling errors in comments were fixed.
> 
> * Changes that appeared in the diff's that were unrelated to UPC
> were removed or incorporated into the trunk.
> 
> * The linkage to the libgupc library was changed to use the newly
> defined method (used in libgomp/libgo for example) of including
> library 'spec' files.  This led to a simplification where we no
> longer needed to add UPC-specific spec. files in various
> target-specific config. directories.
> 
> Introduction: UPC-related Changes
> ---------------------------------
> 
> Below, various UPC-related changes are summarized.
> This introduction is provided as background for review of the UPC
> changes implemented in the GUPC branch.  Each individual change will be
> discussed in more detail in the patch sets found in the following emails.
> 
> The current GUPC branch is based upon a recent version of the GCC trunk
> and has been bootstrapped on x86_64/i686 Linux, x86_64
> Darwin, IA64/Altix Linux, PowerPC Power7 (big endian), and Power8
> (little endian).  Also some testing has been done on various flavors
> of BSD and Solaris and in the past MIPS was tested and supported.
> 
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
> 
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
> 
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
> 
> In the discussion below, some changes are excerpted in order to
> highlight important aspects of the changes.
> 
> UPC's Shared Qualifier and Layout Qualifier
> -------------------------------------------
> 
> The UPC language specification describes
> the language syntax and semantics:
>   http://upc.lbl.gov/publications/upc-spec-1.3.pdf
> 
> UPC introduces a new qualifier, "shared" that indicates that the
> qualified object is located in a global shared address space that is
> accessible by all UPC threads.  Additional qualifiers ("strict" and
> "relaxed") further specify the semantics of accesses to
> UPC shared objects.
> 
> In UPC, a shared qualified array can optionally specify a "layout
> qualifier" that indicates how the shared data is blocked and
> distributed across UPC threads.
> 
> There are two language pre-defined identifiers that indicate the
> number of threads that will be created when the program starts
> (THREADS) and the current (zero-based) thread number (MYTHREAD).
> Typically, a UPC thread is implemented as an operating system process,
> though they may be mapped to pthreads, when compiled with the
> -fupc-pthreads-model-tls switch.
> 
> Access to UPC shared memory may be implemented locally via OS provided
> facilities (for example, mmap), or across nodes via a high speed
> network inter-connect (for example, Infiniband).
> 
> GUPC provides a runtime (libgupc) that targets an SMP-based system
> that uses mmap() to implement global shared memory.
> 
> Optionally, GUPC can use the more general and more capable Berkeley
> UPCR runtime:
>   http://upc.lbl.gov/download/source.shtml#runtime
> The UPCR runtime supports a number of network
> topologies, and has been ported to most of the
> current High Performance Computing (HPC) systems.
> 
> The following example illustrates the use of the UPC "shared" qualifier
> combined with a layout qualifier.
> 
>     #define BLKSIZE 5
>     #define N_PER_THREAD (4 * BLKSIZE)
>     shared [BLKSIZE] double A[N_PER_THREAD*THREADS];
> 
> Above the "[BLKSIZE]" construct is the UPC layout factor; this
> specifies that the shared array, A, distributes its elements across
> each thread in blocks of 5 elements.  If the program is run with two
> threads, then A is distributed as shown below:
> 
>     Thread 0	Thread 1
>     --------	---------
>     A[ 0.. 4]	A[ 5.. 9]
>     A[10..14]	A[15..19]
>     A[20..24]	A[25..29]
>     A[30..34]	A[35..39]
> 
> The elements shown for thread 0 are defined as having "affinity"
> to thread 0.  Similarly, those elements shown for thread 1 have
> affinity to thread 1.  In UPC, a pointer to a shared object can be
> cast to a thread local pointer (a "C" pointer), when the designated
> shared object has affinity to the referencing thread.
> 
> A UPC "pointer-to-shared" (PTS) is a pointer that references a UPC
> shared object.  A UPC pointer-to-shared is a "fat" pointer with the
> following logical fields:
>    (virt_addr, thread, phase)
> 
> The virtual address (virt_addr) field is combined with the thread
> number (thread) to derive the location of the referenced object
> within the UPC shared address space.  The phase field is used
> keep track of the current block offset for PTS's that have
> blocking factor that is greater than one.
> 
> GUPC implements pointer-to-shared objects using a "struct" representation.
> Until recently, GUPC also supported a "packed" representation, which
> is more space efficient, but limits the range of various fields
> in the UPC pointer-to-shared representation.  We have decided to
> support only the "struct" representation so that the compiler uses
> a single ABI that supports the full range of addresses, threads,
> and blocking factors.
> 
> GCC's internal tree representation is extended to record the UPC
> "shared", "strict", "relaxed" qualifiers, and the layout qualifier.
> 
> --- gcc/tree-core.h     (.../trunk)     (revision 228959)
> +++ gcc/tree-core.h     (.../branches/gupc)     (revision 229159)
> @@ -470,7 +470,11 @@ enum cv_qualifier {
>    TYPE_QUAL_CONST    = 0x1,
>    TYPE_QUAL_VOLATILE = 0x2,
>    TYPE_QUAL_RESTRICT = 0x4,
> -  TYPE_QUAL_ATOMIC   = 0x8
> +  TYPE_QUAL_ATOMIC   = 0x8,
> +  /* UPC qualifiers */
> +  TYPE_QUAL_SHARED   = 0x10,
> +  TYPE_QUAL_RELAXED  = 0x20,
> +  TYPE_QUAL_STRICT   = 0x40
>  };
> [...]
> @@ -857,9 +875,14 @@ struct GTY(()) tree_base {
>        unsigned user_align : 1;
>        unsigned nameless_flag : 1;
>        unsigned atomic_flag : 1;
> -      unsigned spare0 : 3;
> -
> -      unsigned spare1 : 8;
> +      unsigned shared_flag : 1;
> +      unsigned strict_flag : 1;
> +      unsigned relaxed_flag : 1;
> +
> +      unsigned threads_factor_flag : 1;
> +      unsigned block_factor_0 : 1;
> +      unsigned block_factor_x : 1;
> +      unsigned spare1 : 5;
> 

You claim bits in tree_base - are those bits really used for
all tree kinds?  The qualifiers look type specific where
eventually FE specific flags in type-lang-specific parts could
have been used (yeah, there are no spare bits in tree_type_*).
Similar the _factor stuff should not be on all tree kinds.

I find the names used a bit unspecific, please consider
prefixing them with upc_ (esp. shared_flag may be confused
with the similar private_flag).

Are these and the new tree codes below living beyond the time
the frontend is in control?  That is, do they need to survive
throughout the middle-end?

Thanks,
Richard.

> UPC defines a few additional tree node types:
> 
> --- gcc/c-family/c-common.def   (.../trunk)     (revision 228959)
> +++ gcc/c-family/c-common.def   (.../branches/gupc)     (revision 229159)
> @@ -62,6 +62,24 @@ DEFTREECODE (SIZEOF_EXPR, "sizeof_expr",
>     Operand 3 is the stride.  */
>  DEFTREECODE (ARRAY_NOTATION_REF, "array_notation_ref", tcc_reference, 4)
> 
> +/* Used to represent a `upc_forall' statement. The operands are
> +   UPC_FORALL_INIT_STMT, UPC_FORALL_COND, UPC_FORALL_EXPR,
> +   UPC_FORALL_BODY, and UPC_FORALL_AFFINITY respectively. */
> +
> +DEFTREECODE (UPC_FORALL_STMT, "upc_forall_stmt", tcc_statement, 5)
> +
> +/* Used to represent a UPC synchronization statement. The first
> +   operand is the synchronization operation, UPC_SYNC_OP:
> +   UPC_SYNC_NOTIFY_OP  1       Notify operation
> +   UPC_SYNC_WAIT_OP    2       Wait operation
> +   UPC_SYNC_BARRIER_OP 3       Barrier operation
> +
> +   The second operand, UPC_SYNC_ID is the (optional) expression
> +   whose value specifies the barrier identifier which is checked
> +   by the various synchronization operations. */
> +
> +DEFTREECODE (UPC_SYNC_STMT, "upc_sync_stmt", tcc_statement, 2)
> +
> 
> The "C" parser is extended to recognize UPC's syntactic extensions.
> 
> --- gcc/c-family/c-common.c     (.../trunk)     (revision 228959)
> +++ gcc/c-family/c-common.c     (.../branches/gupc)     (revision 229159)
> @@ -412,8 +426,9 @@ static int resort_field_decl_cmp (const
>     C --std=c89: D_C99 | D_CXXONLY | D_OBJC | D_CXX_OBJC
>     C --std=c99: D_CXXONLY | D_OBJC
>     ObjC is like C except that D_OBJC and D_CXX_OBJC are not set
> -   C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC
> -   C++ --std=c0x: D_CONLY | D_OBJC
> +   UPC is like C except that D_UPC is not set
> +   C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC | D_UPC
> +   C++ --std=c0x: D_CONLY | D_OBJC | D_UPC
>     ObjC++ is like C++ except that D_OBJC is not set
> [...]
> @@ -629,6 +644,19 @@ const struct c_common_resword c_common_r
>    { "inout",           RID_INOUT,              D_OBJC },
>    { "oneway",          RID_ONEWAY,             D_OBJC },
>    { "out",             RID_OUT,                D_OBJC },
> +
> +  /* UPC keywords */
> +  { "shared",          RID_SHARED,             D_UPC },
> +  { "relaxed",         RID_RELAXED,            D_UPC },
> +  { "strict",          RID_STRICT,             D_UPC },
> +  { "upc_barrier",     RID_UPC_BARRIER,        D_UPC },
> +  { "upc_blocksizeof", RID_UPC_BLOCKSIZEOF,    D_UPC },
> +  { "upc_elemsizeof",  RID_UPC_ELEMSIZEOF,     D_UPC },
> +  { "upc_forall",      RID_UPC_FORALL,         D_UPC },
> +  { "upc_localsizeof", RID_UPC_LOCALSIZEOF,    D_UPC },
> +  { "upc_notify",      RID_UPC_NOTIFY,         D_UPC },
> +  { "upc_wait",                RID_UPC_WAIT,           D_UPC },
> +
> 
> --- gcc/c/c-parser.c    (.../trunk)     (revision 228959)
> +++ gcc/c/c-parser.c    (.../branches/gupc)     (revision 229159)
> [...]
> +/* These UPC parser functions are only ever called when
> +   compiling UPC.  */
> +static void c_parser_upc_forall_statement (c_parser *);
> +static void c_parser_upc_sync_statement (c_parser *, int);
> +static void c_parser_upc_shared_qual (source_location,
> +                                      c_parser *,
> +                                     struct c_declspecs *);
> +
> [...]
> +        /* UPC qualifiers */
> +       case RID_SHARED:
> +         attrs_ok = true;
> +         c_parser_upc_shared_qual (loc, parser, specs);
> +         break;
> +       case RID_STRICT:
> +       case RID_RELAXED:
> +         attrs_ok = true;
> +         declspecs_add_qual (loc, specs, c_parser_peek_token (parser)->value);
> +         c_parser_consume_token (parser);
> +         break;
> [...]
> +  /* Process all #pragma's just after the opening brace.  This
> +     handles #pragma upc, which can only appear just after
> +     the opening brace, when it appears within a function body.  */
> +  push_upc_consistency_mode ();
> +  permit_pragma_upc ();
> +  while (c_parser_next_token_is (parser, CPP_PRAGMA))
> +    {
> +      location_t loc ATTRIBUTE_UNUSED = c_parser_peek_token
> (parser)->location;
> +      if (c_parser_pragma (parser, pragma_compound))
> +        last_label = false, last_stmt = true;
> +      parser->error = false;
> +    }
> +  deny_pragma_upc ();
> [...]
> +       case RID_UPC_FORALL:
> +          gcc_assert (flag_upc);
> +         c_parser_upc_forall_statement (parser);
> +         break;
> +        case RID_UPC_NOTIFY:
> +          gcc_assert (flag_upc);
> +         c_parser_upc_sync_statement (parser, UPC_SYNC_NOTIFY_OP);
> +         goto expect_semicolon;
> +        case RID_UPC_WAIT:
> +          gcc_assert (flag_upc);
> +         c_parser_upc_sync_statement (parser, UPC_SYNC_WAIT_OP);
> +         goto expect_semicolon;
> +        case RID_UPC_BARRIER:
> +          gcc_assert (flag_upc);
> +         c_parser_upc_sync_statement (parser, UPC_SYNC_BARRIER_OP);
> +         goto expect_semicolon;
> [...]
>         case RID_SIZEOF:
>           return c_parser_sizeof_expression (parser);
> +       case RID_UPC_BLOCKSIZEOF:
> +       case RID_UPC_ELEMSIZEOF:
> +       case RID_UPC_LOCALSIZEOF:
> +          gcc_assert (flag_upc);
> +         return c_parser_sizeof_expression (parser);
> [...]
> 
> --- gcc/c-family/c-pragma.c     (.../trunk)     (revision 228959)
> +++ gcc/c-family/c-pragma.c     (.../branches/gupc)     (revision 229159)
> [...]
> +/*
> + *  #pragma upc strict
> + *  #pragma upc relaxed
> + *  #pragma upc upc_code
> + *  #pragma upc c_code
> + */
> +static void
> +handle_pragma_upc (cpp_reader * ARG_UNUSED (dummy))
> +{
> [...]
> 
> c-decl.c handles the additional UPC qualifiers and declspecs.
> The layout qualifier is handled here:
> 
> --- gcc/c/c-decl.c      (.../trunk)     (revision 228959)
> +++ gcc/c/c-decl.c      (.../branches/gupc)     (revision 229159)
> [...]
> +  /* A UPC layout qualifier is encoded as an ARRAY_REF,
> +     further, it implies the presence of the 'shared' keyword. */
> +  if (TREE_CODE (qual) == ARRAY_REF)
> +    {
> +      if (specs->upc_layout_qualifier)
> +        {
> +          error ("two or more layout qualifiers specified");
> +          return specs;
> +        }
> +      else
> +        {
> +          specs->upc_layout_qualifier = qual;
> +          qual = ridpointers[RID_SHARED];
> +        }
> +    }
> 
> In UPC, a qualifier includes both the traditional
> "C" qualifier flags and the UPC "layout qualifier".
> Thus, the pointer_quals field of a declarator node
> is defined as a struct including both qualifier
> flags and the UPC type qualifier, as shown below.
> 
>             /* Process type qualifiers (such as const or volatile)
>                that were given inside the `*'.  */
> -           type_quals = declarator->u.pointer_quals;
> +           type_quals = declarator->u.pointer.quals;
> +           upc_layout_qualifier = declarator->u.pointer.upc_layout_qual;
> +           sharedp = ((type_quals & TYPE_QUAL_SHARED) != 0);
> 
> UPC shared variables are allocated at runtime in the global memory
> that is allocated and managed by the UPC runtime.  A separate link
> section is used as a method of assigning virtual addresses to UPC
> shared variables.  The UPC shared variable section is designated as a
> "no load" section on systems that support that facility; in that case,
> the linkage section begins at virtual address zero.  The logic below
> assigns UPC shared variables to their own linkage section.
> 
> +    /* Shared variables are given their own link section on
> +       most target platforms, and if compiling in pthreads mode
> +       regular local file scope variables are made thread local. */
> +    if ((TREE_CODE(decl) == VAR_DECL)
> +        && !threadp && (TREE_SHARED (decl) || flag_upc_pthreads))
> +      upc_set_decl_section (decl);
> +
> 
> Patches
> -------
> 
> The patches are organized into the following categories
> and will be sent out as separate email messages.
> 
> [UPC 01/22] front-end changes
> [UPC 02/22] tree-related changes
> [UPC 03/22] options processing, driver
> [UPC 04/22] Make, Config changes
> [UPC 05/22] language hooks changes
> [UPC 06/22] target hooks
> [UPC 07/22] lowering, pointer-to-shared ops
> [UPC 08/22] target - Darwin
> [UPC 09/22] target - x86
> [UPC 10/22] target - rs6000
> [UPC 11/22] documentation
> [UPC 12/22] DWARF support
> [UPC 13/22] C++ changes
> [UPC 14/22] constant folding changes
> [UPC 15/22] RTL changes
> [UPC 16/22] gimple/gimplify changes
> [UPC 17/22] misc/common changes
> [UPC 18/22] libatomic changes
> [UPC 19/22] libgupc - Make, Configure
> [UPC 20/22] libgupc runtime library
> [UPC 21/22] gcc.dg test suite
> [UPC 22/22] libgupc test suite
> 
> thanks,
> - Gary
> 
>
Bernd Schmidt Dec. 1, 2015, 11:19 a.m. UTC | #2
On 12/01/2015 06:31 AM, Gary Funck wrote:
> At this time, we would like to re-submit the UPC patches for comment
> with the goal of introducing these changes into GCC 6.0.

This has missed stage 1 by a few weeks, we'd have to make an exception 
to include it at this late stage.

> @@ -857,9 +875,14 @@ struct GTY(()) tree_base {
>         unsigned user_align : 1;
>         unsigned nameless_flag : 1;
>         unsigned atomic_flag : 1;
> -      unsigned spare0 : 3;
> -
> -      unsigned spare1 : 8;
> +      unsigned shared_flag : 1;
> +      unsigned strict_flag : 1;
> +      unsigned relaxed_flag : 1;
> +
> +      unsigned threads_factor_flag : 1;
> +      unsigned block_factor_0 : 1;
> +      unsigned block_factor_x : 1;
> +      unsigned spare1 : 5;

That's a lot of bits used up at once.

Does this solve anything that cannot be done with OpenMP, which we 
already support? Can you show us any users of this that demonstrate that 
this is actually in use by anyone outside the universities responsible 
for UPC? The language standard is apparently from 2005, but I've never 
heard of it and googling "upc" does not give any sensible results. The 
gccupc mailing list seems to have been dead for years judging by the 
archives. I'm worried we'll end up carrying something around as a burden 
that is of no practical use (considering we already support the more 
widespread OpenMP).


Bernd
Jeff Law Dec. 1, 2015, 1:58 p.m. UTC | #3
On 12/01/2015 04:12 AM, Richard Biener wrote:
> On Mon, 30 Nov 2015, Gary Funck wrote:
>
>>
>> Some time ago, we submitted an RFC for the introduction of
>> UPC support into GCC.  During the intervening time period,
>> we have continued to keep the 'gupc' (GNU UPC) branch in sync
>> with the GCC trunk and have incorporated feedback and contributions from
>> various GCC developers (Joseph Myers, Tom Tromey, Jakub Jelinek,
>> Richard Henderson, Meador Inge, and others).  We have also implemented
>> various bug fixes and improvements.
>>
>> At this time, we would like to re-submit the UPC patches for comment
>> with the goal of introducing these changes into GCC 6.0.
>
> First of all let me say that it is IMNSHO now too late for GCC 6.
Agreed.  I put it in my queue of stuff to look at in the spring when 
development opens for GCC 7.

jeff
Andi Kleen Dec. 1, 2015, 3:11 p.m. UTC | #4
Bernd Schmidt <bschmidt@redhat.com> writes:

> I'm worried we'll end up carrying
> something around as a burden that is of no practical use (considering
> we already support the more widespread OpenMP).

I'm not an expert on UPC, but from glancing over the description it
seems to target a distributed message passing programing model,
which is very different from OpenMP. I don't think any of the existing
parallelization models in gcc (OpenMP, cilk) support that niche.

-Andi
Richard Biener Dec. 1, 2015, 3:23 p.m. UTC | #5
On Tue, 1 Dec 2015, Andi Kleen wrote:

> Bernd Schmidt <bschmidt@redhat.com> writes:
> 
> > I'm worried we'll end up carrying
> > something around as a burden that is of no practical use (considering
> > we already support the more widespread OpenMP).
> 
> I'm not an expert on UPC, but from glancing over the description it
> seems to target a distributed message passing programing model,
> which is very different from OpenMP. I don't think any of the existing
> parallelization models in gcc (OpenMP, cilk) support that niche.

Fortran CoArrays do though.  Ok, slightly irrelevant...

Btw, I don't think we should talk about "no practical use" given
we took openACC.

Richard.
Gary Funck Dec. 1, 2015, 4:34 p.m. UTC | #6
On 12/01/15 12:12:29, Richard Biener wrote:
> On Mon, 30 Nov 2015, Gary Funck wrote:
> > At this time, we would like to re-submit the UPC patches for comment
> > with the goal of introducing these changes into GCC 6.0.
>
>  First of all let me say that it is IMNSHO now too late for GCC 6.

I realize that stage 1 recently closed, and that if UPC were
accepted for inclusion, it would be an exception.  To offset
potential risk, we perform weekly merges and run a large suite
of tests and apps. on differing hosts/cpu architectures.
We have also tried to follow the sorts of re-factoring and C++
changes made over the course of the last year/so.  I'd just ask
that the changes be given some further consideration for 6.0.

> You claim bits in tree_base - are those bits really used for
> all tree kinds?  The qualifiers look type specific where
> eventually FE specific flags in type-lang-specific parts could
> have been used (yeah, there are no spare bits in tree_type_*).
> Similar the _factor stuff should not be on all tree kinds.

When we first started building the gupc branch, it was suggested
that UPC be implemented as a separate language ala ObjC.
In that case, we used "language bits".  Over time, this approach
fell out of favor, and we were asked to move everything into
the C front-end and middle-end, making compilation contingent
upon -fupc, which is the way it is now.  Also, over the past
couple of years, there has been work to minimize the number of
bits used by tree nodes, so some additional changes were needed.

The main change recommended to reduce tree space was moving the
"layout factor" (blocking factor) out of the tree node, and using
only two bits there, one bit for a relatively common case of 0,
and the other for > 1.  It was suggested that we use a hash
table to map tree nodes to layout qualifiers for the case they
are > 1.  This necessitated using a garbage collected tree map,
which unfortunately meant that tree nodes needed special garbage
collection logic.

It is worth noting that the "layout qualifier" is an integral
constant, currently represented as a tree node reference.
It might be possible to represent it as a "wide int" instead.
I did give that a go once, but it rippled through the code
making some things awkward.  Perhaps not as awkward as a
custom tree node GC routine; this could be re-visited.

> I find the names used a bit unspecific, please consider
> prefixing them with upc_ (esp. shared_flag may be confused
> with the similar private_flag).

When we previously asked for a review, it was noted that
if the UPC bits were moved into what amounts to common/generic
tree node fields that we should drop UPC_ or upc_ from the
related node names and functions.  That's what we did.
There is some middle ground, for example, where only
TYPE_SHARED_P() is renamed to UPC_SHARED_TYPE_P()
and the rest remain as is.

Since renames are straight forward, we can make any
recommended changes quickly.

Originally, we were keeping the door open for UPC++, but
there are complications with generalizing C++ into a multi-node
environment, and that idea has been tabled for now.
Therefore, the current structure/implementation is C only,
with most of the new front-end/middle-end logic under
the c/ directory.

> Are these and the new tree codes below living beyond the time
> the frontend is in control?  That is, do they need to survive
> throughout the middle-end?

I'm not sure where the line is drawn for the front-end and middle-end.
After upc_genericize() runs (just before c_genericize())
all operations on tree nodes that are UPC-specific are lowered
into operations on the internal representation of a pointer-to-shared
and/or runtime calls that operate on the internal representation.
The pointer-to-shared values/types still show up
in the tree, but only as containers (pointers-to-shared
are typically 2x the size of a regular "C" pointer).

The places where SHARED_TYPE_P() is referenced in 'c/'
and 'c-family/' are:

    c/c-convert.c
    c/c-objc-common.c
    c/c-upc-pts-ops.c
    c/c-parser.c
    c/c-typeck.c
    c/c-upc-low.c
    c/c-upc-lang.c
    c/c-decl.c
    c/c-upc.c
    c-family/c-common.c

The places in the gcc top-level where SHARED_TYPE_P()
is referenced are:

    convert.c
    explow.c
    fold-const.c
    function.c
    gimple-expr.c
    match.pd
    tree.c
    tree.h
    tree-sra.c

The target-specific references are here:

    config/rs6000/rs6000.c
    config/i386/i386.c

All of the references outside of c/ and c-family/
and tree.[ch] are to differentiate operations on UPC pointers-to-shared from
regular "C" pointers.  (Some/all of those references might
be mitigated by defining new language hooks.  We haven't looked
into that.)

It may be the case that in the current design, that only
the "shared" bit is needed in the common (base?) tree node,
as long as there is some way to record the additional
"strict" and "relaxed" qualifiers, the "layout qualifier"
(a tree reference to an integral constant), and the
"THREADS scaled" bit.  The main thing that would need to
be checked is when/where debugging (DWARF) info. is generated.

thanks,
- Gary
Richard Biener Dec. 2, 2015, 9:40 a.m. UTC | #7
On Tue, 1 Dec 2015, Gary Funck wrote:

> On 12/01/15 12:12:29, Richard Biener wrote:
> > On Mon, 30 Nov 2015, Gary Funck wrote:
> > > At this time, we would like to re-submit the UPC patches for comment
> > > with the goal of introducing these changes into GCC 6.0.
> >
> >  First of all let me say that it is IMNSHO now too late for GCC 6.
> 
> I realize that stage 1 recently closed, and that if UPC were
> accepted for inclusion, it would be an exception.  To offset
> potential risk, we perform weekly merges and run a large suite
> of tests and apps. on differing hosts/cpu architectures.
> We have also tried to follow the sorts of re-factoring and C++
> changes made over the course of the last year/so.  I'd just ask
> that the changes be given some further consideration for 6.0.
> 
> > You claim bits in tree_base - are those bits really used for
> > all tree kinds?  The qualifiers look type specific where
> > eventually FE specific flags in type-lang-specific parts could
> > have been used (yeah, there are no spare bits in tree_type_*).
> > Similar the _factor stuff should not be on all tree kinds.
> 
> When we first started building the gupc branch, it was suggested
> that UPC be implemented as a separate language ala ObjC.
> In that case, we used "language bits".  Over time, this approach
> fell out of favor, and we were asked to move everything into
> the C front-end and middle-end, making compilation contingent
> upon -fupc, which is the way it is now.  Also, over the past
> couple of years, there has been work to minimize the number of
> bits used by tree nodes, so some additional changes were needed.
> 
> The main change recommended to reduce tree space was moving the
> "layout factor" (blocking factor) out of the tree node, and using
> only two bits there, one bit for a relatively common case of 0,
> and the other for > 1.  It was suggested that we use a hash
> table to map tree nodes to layout qualifiers for the case they
> are > 1.  This necessitated using a garbage collected tree map,
> which unfortunately meant that tree nodes needed special garbage
> collection logic.

I still don't see why it needs special garbage collection logic.
We have many tree -> X maps that just get away without.

> It is worth noting that the "layout qualifier" is an integral
> constant, currently represented as a tree node reference.
> It might be possible to represent it as a "wide int" instead.
> I did give that a go once, but it rippled through the code
> making some things awkward.  Perhaps not as awkward as a
> custom tree node GC routine; this could be re-visited.

As said, I don't see why you need a special GC collection logic
at all.  Please explain.

> > I find the names used a bit unspecific, please consider
> > prefixing them with upc_ (esp. shared_flag may be confused
> > with the similar private_flag).
> 
> When we previously asked for a review, it was noted that
> if the UPC bits were moved into what amounts to common/generic
> tree node fields that we should drop UPC_ or upc_ from the
> related node names and functions.  That's what we did.
> There is some middle ground, for example, where only
> TYPE_SHARED_P() is renamed to UPC_SHARED_TYPE_P()
> and the rest remain as is.
> 
> Since renames are straight forward, we can make any
> recommended changes quickly.
> 
> Originally, we were keeping the door open for UPC++, but
> there are complications with generalizing C++ into a multi-node
> environment, and that idea has been tabled for now.
> Therefore, the current structure/implementation is C only,
> with most of the new front-end/middle-end logic under
> the c/ directory.
> 
> > Are these and the new tree codes below living beyond the time
> > the frontend is in control?  That is, do they need to survive
> > throughout the middle-end?
> 
> I'm not sure where the line is drawn for the front-end and middle-end.
> After upc_genericize() runs (just before c_genericize())
> all operations on tree nodes that are UPC-specific are lowered
> into operations on the internal representation of a pointer-to-shared
> and/or runtime calls that operate on the internal representation.
> The pointer-to-shared values/types still show up
> in the tree, but only as containers (pointers-to-shared
> are typically 2x the size of a regular "C" pointer).

The line between FE and middle-end is indeed a bit of a grey area.
I am considering everything after gimplification middle-end.  This
means that UPC lowering is done in the frontend.  And if indeed
none of the "special" pointers survive to middle-end code then
more of the bits needed could go into on-the-side structures.

I'd have done a

hash_map<tree, upc_state>

and put all of the UPC state in there.  And just hash the tree
by pointer.

I realize that by using the C frontend you think you need to
make the UPC types variants of the C type.  I'm questioning that
but don't know too much about the issue (not needing that would
avoid exposing the UPC types to tree.[ch]).

Of course much of the middle-end overlap comes from the fact
that the C family frontends share 'tree' with the middle-end.

> The places where SHARED_TYPE_P() is referenced in 'c/'
> and 'c-family/' are:
> 
>     c/c-convert.c
>     c/c-objc-common.c
>     c/c-upc-pts-ops.c
>     c/c-parser.c
>     c/c-typeck.c
>     c/c-upc-low.c
>     c/c-upc-lang.c
>     c/c-decl.c
>     c/c-upc.c
>     c-family/c-common.c
> 
> The places in the gcc top-level where SHARED_TYPE_P()
> is referenced are:
> 
>     convert.c
>     explow.c
>     fold-const.c
>     function.c
>     gimple-expr.c
>     match.pd
>     tree.c
>     tree.h
>     tree-sra.c
> 
> The target-specific references are here:
> 
>     config/rs6000/rs6000.c
>     config/i386/i386.c

I'll note that this is only for an ABI extension which could be
more "easily" implemented by the frontend lowering by simply
extracting the two pieces and using two parameters?  Note that
at least x86_64 already passes such structures in registers.
How exactly does your custom-built structure look like?

As a side-note I doubt that special argument passing conventions
are required to get performance out of a scheme that eventually
goes over the network or does most of the processing in loops.

> All of the references outside of c/ and c-family/
> and tree.[ch] are to differentiate operations on UPC pointers-to-shared from
> regular "C" pointers.  (Some/all of those references might
> be mitigated by defining new language hooks.  We haven't looked
> into that.)

But as I understand from above those pointers-to-shared are lowered
to "non-pointers"?  Why can't you make them non-pointers in the first
place?  I suppose integrating with the rest of the C FE is easier
if you make them appear as pointers?

> It may be the case that in the current design, that only
> the "shared" bit is needed in the common (base?) tree node,
> as long as there is some way to record the additional
> "strict" and "relaxed" qualifiers, the "layout qualifier"
> (a tree reference to an integral constant), and the
> "THREADS scaled" bit.  The main thing that would need to
> be checked is when/where debugging (DWARF) info. is generated.

DWARF info for types nowadays is generated early so a langhook
can extract extra data.

Thanks,
Richard.
Gary Funck Dec. 2, 2015, 5:08 p.m. UTC | #8
On 12/01/15 12:19:48, Bernd Schmidt wrote:
> On 12/01/2015 06:31 AM, Gary Funck wrote:
> >At this time, we would like to re-submit the UPC patches for comment
> >with the goal of introducing these changes into GCC 6.0.
> 
> This has missed stage 1 by a few weeks, we'd have to make an exception to
> include it at this late stage.

Based upon the feedback, it looks like GCC 6.0 is not feasible.

> 
> >@@ -857,9 +875,14 @@ struct GTY(()) tree_base {
> >        unsigned user_align : 1;
> >        unsigned nameless_flag : 1;
> >        unsigned atomic_flag : 1;
> >-      unsigned spare0 : 3;
> >-
> >-      unsigned spare1 : 8;
> >+      unsigned shared_flag : 1;
> >+      unsigned strict_flag : 1;
> >+      unsigned relaxed_flag : 1;
> >+
> >+      unsigned threads_factor_flag : 1;
> >+      unsigned block_factor_0 : 1;
> >+      unsigned block_factor_x : 1;
> >+      unsigned spare1 : 5;
> 
> That's a lot of bits used up at once.

I provided some additional background in my reply to Richard.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00136.html

> Does this solve anything that cannot be done with OpenMP, which we already
> support?

UPC's target use is similar to that of Co-Array Fortran (CAF),
mainly multi-node computations in addition to multi-core.
Also, in the sense that UPC makes syntactic extensions
it is arguably easier to write and to understand UPC programs
than the pragma based approach used by OpenMP and Cilk, or
library based solutions like MPI.

Here is an example application, parallel merge sort,
written in MPI, OpenMP, UPC, hybrid MPI/OpenMP, 
and hybrid UPC/OpenMP which illustrates how they
compare in terms of expressivity.  In general, the
bulks synchronous UPC implementation out-performs
the others.

https://github.com/gary-funck/parallel-merge-sort

> Can you show us any users of this that demonstrate that this is
> actually in use by anyone outside the universities responsible for UPC?

It is primarily used in universities and research labs.
Cray, IBM, and HP offer their own commercial compilers on
their HPC platforms.  Berkeley has an open UPC-to-C translator
and we have separately built a Clang-based compiler and
source-to-source translator.

> The language standard is apparently from 2005 [...]

The spec was updated in 2013.
http://upc.lbl.gov/publications/upc-spec-1.3.pdf

> but I've never heard of it and
> googling "upc" does not give any sensible results. The gccupc mailing list
> seems to have been dead for years judging by the archives. I'm worried we'll
> end up carrying something around as a burden that is of no practical use
> (considering we already support the more widespread OpenMP).

UPC is more similar to Co-Array Fortran (CAF) than OpenMP.
I don't keep up with developments in the OpenMP or OpenACC standards,
so am unaware of proposals to generalize them for multi-node
HPC applications.  As mentioned, IMO, UPC is more expressive
than OpenMP (which is pragma based).  Their programming
models are also different.  UPC is SIMD, and OpenMP uses
dynamic task dispatching.

Regarding a possible "burden", we have tried to modularize
the changes to minimize impact on the compiler.

We floated the idea of including UPC in GCC a few years back;
there were no objections at that time.  In the mean time,
we have been implementing changes based upon feedback,
porting the runtime to other communication layers
and implementing the changes needed to conform
to the 2013 UPC specification.

thanks,
- Gary
Gary Funck Dec. 5, 2015, 10:57 p.m. UTC | #9
On 12/02/15 10:40:50, Richard Biener wrote:
> On Tue, 1 Dec 2015, Gary Funck wrote:
> > The main change recommended to reduce tree space was moving the
> > "layout factor" (blocking factor) out of the tree node, and using
> > only two bits there, one bit for a relatively common case of 0,
> > and the other for > 1.  It was suggested that we use a hash
> > table to map tree nodes to layout qualifiers for the case they
> > are > 1.  This necessitated using a garbage collected tree map,
> > which unfortunately meant that tree nodes needed special garbage
> > collection logic.
> 
> I still don't see why it needs special garbage collection logic.
> We have many tree -> X maps that just get away without.
> [...]
> As said, I don't see why you need a special GC collection logic
> at all.  Please explain.

The problem we ran into is that we had a tree map which
mapped a tree node to its UPC layout qualifier, which
is an integral constant (CST).  Tree nodes for CST's
are made unique by hashing them into an integer->tree map.
What happened is that the GC would free up entries in the
CST table that were being used by the layout qualifier map.

It seems that the new tree map logic fixes this problem
(as you've noted).  I backed out the custom GC logic and
re-ran a large test that extensively exercises the use case
that triggers the issue -- works like a charm.

thanks,
- Gary
Richard Biener Dec. 6, 2015, 7:57 a.m. UTC | #10
On December 5, 2015 11:57:27 PM GMT+01:00, Gary Funck <gary@intrepid.com> wrote:
>On 12/02/15 10:40:50, Richard Biener wrote:
>> On Tue, 1 Dec 2015, Gary Funck wrote:
>> > The main change recommended to reduce tree space was moving the
>> > "layout factor" (blocking factor) out of the tree node, and using
>> > only two bits there, one bit for a relatively common case of 0,
>> > and the other for > 1.  It was suggested that we use a hash
>> > table to map tree nodes to layout qualifiers for the case they
>> > are > 1.  This necessitated using a garbage collected tree map,
>> > which unfortunately meant that tree nodes needed special garbage
>> > collection logic.
>> 
>> I still don't see why it needs special garbage collection logic.
>> We have many tree -> X maps that just get away without.
>> [...]
>> As said, I don't see why you need a special GC collection logic
>> at all.  Please explain.
>
>The problem we ran into is that we had a tree map which
>mapped a tree node to its UPC layout qualifier, which
>is an integral constant (CST).  Tree nodes for CST's
>are made unique by hashing them into an integer->tree map.
>What happened is that the GC would free up entries in the
>CST table that were being used by the layout qualifier map.
>
>It seems that the new tree map logic fixes this problem
>(as you've noted).  I backed out the custom GC logic and
>re-ran a large test that extensively exercises the use case
>that triggers the issue -- works like a charm.

That's good news!

Richard.

>thanks,
>- Gary
diff mbox

Patch

--- gcc/tree-core.h     (.../trunk)     (revision 228959)
+++ gcc/tree-core.h     (.../branches/gupc)     (revision 229159)
@@ -470,7 +470,11 @@  enum cv_qualifier {
   TYPE_QUAL_CONST    = 0x1,
   TYPE_QUAL_VOLATILE = 0x2,
   TYPE_QUAL_RESTRICT = 0x4,
-  TYPE_QUAL_ATOMIC   = 0x8
+  TYPE_QUAL_ATOMIC   = 0x8,
+  /* UPC qualifiers */
+  TYPE_QUAL_SHARED   = 0x10,
+  TYPE_QUAL_RELAXED  = 0x20,
+  TYPE_QUAL_STRICT   = 0x40
 };
[...]
@@ -857,9 +875,14 @@  struct GTY(()) tree_base {
       unsigned user_align : 1;
       unsigned nameless_flag : 1;
       unsigned atomic_flag : 1;
-      unsigned spare0 : 3;
-
-      unsigned spare1 : 8;
+      unsigned shared_flag : 1;
+      unsigned strict_flag : 1;
+      unsigned relaxed_flag : 1;
+
+      unsigned threads_factor_flag : 1;
+      unsigned block_factor_0 : 1;
+      unsigned block_factor_x : 1;
+      unsigned spare1 : 5;

UPC defines a few additional tree node types:

--- gcc/c-family/c-common.def   (.../trunk)     (revision 228959)
+++ gcc/c-family/c-common.def   (.../branches/gupc)     (revision 229159)
@@ -62,6 +62,24 @@  DEFTREECODE (SIZEOF_EXPR, "sizeof_expr",
    Operand 3 is the stride.  */
 DEFTREECODE (ARRAY_NOTATION_REF, "array_notation_ref", tcc_reference, 4)

+/* Used to represent a `upc_forall' statement. The operands are
+   UPC_FORALL_INIT_STMT, UPC_FORALL_COND, UPC_FORALL_EXPR,
+   UPC_FORALL_BODY, and UPC_FORALL_AFFINITY respectively. */
+
+DEFTREECODE (UPC_FORALL_STMT, "upc_forall_stmt", tcc_statement, 5)
+
+/* Used to represent a UPC synchronization statement. The first
+   operand is the synchronization operation, UPC_SYNC_OP:
+   UPC_SYNC_NOTIFY_OP  1       Notify operation
+   UPC_SYNC_WAIT_OP    2       Wait operation
+   UPC_SYNC_BARRIER_OP 3       Barrier operation
+
+   The second operand, UPC_SYNC_ID is the (optional) expression
+   whose value specifies the barrier identifier which is checked
+   by the various synchronization operations. */
+
+DEFTREECODE (UPC_SYNC_STMT, "upc_sync_stmt", tcc_statement, 2)
+

The "C" parser is extended to recognize UPC's syntactic extensions.

--- gcc/c-family/c-common.c     (.../trunk)     (revision 228959)
+++ gcc/c-family/c-common.c     (.../branches/gupc)     (revision 229159)
@@ -412,8 +426,9 @@  static int resort_field_decl_cmp (const
    C --std=c89: D_C99 | D_CXXONLY | D_OBJC | D_CXX_OBJC
    C --std=c99: D_CXXONLY | D_OBJC
    ObjC is like C except that D_OBJC and D_CXX_OBJC are not set
-   C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC
-   C++ --std=c0x: D_CONLY | D_OBJC
+   UPC is like C except that D_UPC is not set
+   C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC | D_UPC
+   C++ --std=c0x: D_CONLY | D_OBJC | D_UPC
    ObjC++ is like C++ except that D_OBJC is not set
[...]
@@ -629,6 +644,19 @@  const struct c_common_resword c_common_r