[og9] Re-do OpenACC private variable resolution
diff mbox series

Message ID 20191016190849.70347-1-julian@codesourcery.com
State New
Headers show
Series
  • [og9] Re-do OpenACC private variable resolution
Related show

Commit Message

Julian Brown Oct. 16, 2019, 7:08 p.m. UTC
This patch (for the openacc-gcc-9-branch) reworks how the partitioning
level for OpenACC "private" variables is calculated and represented in
the compiler. There have been two previous approaches:

 - The first (by Chung-Lin Tang) recorded which variables should be
   made private per-gang in each front end (i.e. separately in C, C++
   and Fortran) using a new attribute "oacc gangprivate". This was deemed
   too early; the final determination about which loops are assigned
   which parallelism level has not yet been made at parse time.

 - The second, last discussed here:

     https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html

   moved the analysis of OpenACC contexts to determine parallelism levels
   to omp-low.c (but kept the "oacc gangprivate" attribute and the NVPTX
   backend parts). However (as mentioned in that mail), this is still
   too early: in fact the final determination of the parallelism level
   for each loop (especially for loops without explicit gang/worker/vector
   clauses) does not happen until we reach the device compiler, in the
   oaccloops pass.

This patch builds on the second approach, but delays fixing the parallelism
level of each "private" variable (those that are addressable, and declared
private using OpenACC clauses or by defining them in a scope nested
within a compute region or partitioned loop) until the oaccdevlow pass.
This is done by adding a new internal UNIQUE function (OACC_PRIVATE)
that lists (the address of) each private variable as an argument.
These new internal functions fit into the existing scheme for demarking
OpenACC loops, as described in comments in the patch.

Use of the "oacc gangprivate" attribute is now restricted to the NVPTX
backend (and could probably be replaced with some lighter-weight mechanism
as a followup).

Tested with offloading to NVPTX and GCN (separately). I will apply to
the openacc-gcc-9-branch shortly.

Thanks,

Julian

ChangeLog

	gcc/
	* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename
	to...
	(gcn_goacc_adjust_private_decl): ...this.
	* config/gcn/gcn-tree.c (diagnostic-core.h): Include.
	(gcn_goacc_adjust_gangprivate_decl): Rename to...
	(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
	* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
	(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
	* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
	(nvptx_goacc_adjust_private_decl): New function.
	(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook using above function.
	* doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
	(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
	* doc/tm.texi: Regenerated.
	* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
	* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
	* omp-low.c (omp_context): Remove oacc_partitioning_levels field.
	(lower_oacc_reductions): Add PRIVATE_MARKER parameter.  Insert before
	fork.
	(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify its
	gimple call arguments as appropriate. Don't set
	oacc_partitioning_levels in omp_context. Pass private_marker to
	lower_oacc_reductions.
	(oacc_record_private_var_clauses): Don't check for NULL ctx.
	(make_oacc_private_marker): New function.
	(lower_omp_for): Only call oacc_record_vars_in_bind for
	OpenACC contexts.  Create private marker and pass to
	lower_oacc_head_tail.
	(lower_omp_target): Remove unnecessary call to
	oacc_record_private_var_clauses. Remove call to mark_oacc_gangprivate.
	Create private marker and pass to lower_oacc_reductions.
	(process_oacc_gangprivate_1): Remove.
	(lower_omp_1): Only call oacc_record_vars_in_bind for OpenACC.  Don't
	iterate over contexts calling process_oacc_gangprivate_1.
	(omp-offload.c (oacc_loop_xform_head_tail): Treat
	private-variable markers like fork/join when transforming head/tail
	sequences.
	(execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE instead of
	"oacc gangprivate" attributes to determine partitioning level of
	variables.
	* omp-sese.c (find_gangprivate_vars): New function.
	(find_local_vars_to_propagate): Use GANGPRIVATE_VARS parameter instead
	of "oacc gangprivate" attribute to determine which variables are
	gang-private.
	(oacc_do_neutering): Use find_gangprivate_vars.
	* target.def (adjust_gangprivate_decl): Rename to...
	(adjust_private_decl): ...this.  Update documentation (briefly).

	libgomp/
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Use
	oaccdevlow dump and update scanned output.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Likewise.
	Add missing atomic to force worker partitioning for test variable.
---
 gcc/config/gcn/gcn-protos.h                   |   2 +-
 gcc/config/gcn/gcn-tree.c                     |   6 +-
 gcc/config/gcn/gcn.c                          |   4 +-
 gcc/config/nvptx/nvptx.c                      |  26 ++++
 gcc/doc/tm.texi                               |   5 +-
 gcc/doc/tm.texi.in                            |   2 +-
 gcc/internal-fn.c                             |   2 +
 gcc/internal-fn.h                             |   3 +-
 gcc/omp-low.c                                 | 127 ++++++++++++------
 gcc/omp-offload.c                             |  56 ++++++--
 gcc/omp-sese.c                                |  54 +++++++-
 gcc/target.def                                |   7 +-
 .../gangprivate-attrib-1.f90                  |   4 +-
 .../gangprivate-attrib-2.f90                  |   8 +-
 14 files changed, 232 insertions(+), 74 deletions(-)

Patch
diff mbox series

diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index 1711862c6a2..e33c0598fee 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -39,7 +39,7 @@  extern rtx gcn_gen_undef (machine_mode);
 extern bool gcn_global_address_p (rtx);
 extern tree gcn_goacc_create_propagation_record (tree record_type, bool sender,
 						 const char *name);
-extern void gcn_goacc_adjust_gangprivate_decl (tree var);
+extern void gcn_goacc_adjust_private_decl (tree var, int level);
 extern void gcn_goacc_reduction (gcall *call);
 extern bool gcn_hard_regno_rename_ok (unsigned int from_reg,
 				      unsigned int to_reg);
diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index 04902a39b29..db8e290dc78 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -44,6 +44,7 @@ 
 #include "cgraph.h"
 #include "targhooks.h"
 #include "langhooks-def.h"
+#include "diagnostic-core.h"
 
 /* }}}  */
 /* {{{ OMP GCN pass.
@@ -697,8 +698,11 @@  gcn_goacc_create_propagation_record (tree record_type, bool sender,
 }
 
 void
-gcn_goacc_adjust_gangprivate_decl (tree var)
+gcn_goacc_adjust_private_decl (tree var, int level)
 {
+  if (level != GOMP_DIM_GANG)
+    return;
+
   tree type = TREE_TYPE (var);
   tree lds_type = build_qualified_type (type,
 		    TYPE_QUALS_NO_ADDR_SPACE (type)
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index e0a558b289a..2835a3d7141 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -6044,8 +6044,8 @@  print_operand (FILE *file, rtx x, int code)
 #undef  TARGET_GOACC_CREATE_PROPAGATION_RECORD
 #define TARGET_GOACC_CREATE_PROPAGATION_RECORD \
   gcn_goacc_create_propagation_record
-#undef  TARGET_GOACC_ADJUST_GANGPRIVATE_DECL
-#define TARGET_GOACC_ADJUST_GANGPRIVATE_DECL gcn_goacc_adjust_gangprivate_decl
+#undef  TARGET_GOACC_ADJUST_PRIVATE_DECL
+#define TARGET_GOACC_ADJUST_PRIVATE_DECL gcn_goacc_adjust_private_decl
 #undef  TARGET_GOACC_FORK_JOIN
 #define TARGET_GOACC_FORK_JOIN gcn_fork_join
 #undef  TARGET_GOACC_REDUCTION
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d6b2881d110..2a41d565994 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -76,6 +76,7 @@ 
 #include "intl.h"
 #include "tree-hash-traits.h"
 #include "omp-sese.h"
+#include "tree-pretty-print.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -6019,6 +6020,28 @@  nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+/* Implement TARGET_GOACC_ADJUST_PRIVATE_DECL.  Set "oacc gangprivate"
+   attribute for gang-private variable declarations.  */
+
+void
+nvptx_goacc_adjust_private_decl (tree decl, int level)
+{
+  if (level != GOMP_DIM_GANG)
+    return;
+
+  if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Setting 'oacc gangprivate' attribute for decl:");
+	  print_generic_decl (dump_file, decl, TDF_SLIM);
+	  fputc ('\n', dump_file);
+	}
+      tree id = get_identifier ("oacc gangprivate");
+      DECL_ATTRIBUTES (decl) = tree_cons (id, NULL, DECL_ATTRIBUTES (decl));
+    }
+}
+
 /* Implement TARGET_GOACC_EXPAND_ACCEL_VAR.  Place "oacc gangprivate"
    variables in shared memory.  */
 
@@ -6201,6 +6224,9 @@  nvptx_set_current_function (tree fndecl)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_ADJUST_PRIVATE_DECL
+#define TARGET_GOACC_ADJUST_PRIVATE_DECL nvptx_goacc_adjust_private_decl
+
 #undef TARGET_GOACC_EXPAND_ACCEL_VAR
 #define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 536a436b1c4..e44f8058421 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6162,8 +6162,9 @@  memories.  A return value of NULL indicates that the target does not
 handle this VAR_DECL, and normal RTL expanding is resumed.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_GANGPRIVATE_DECL (tree @var{var})
-Tweak variable declaration for a gang-private variable.
+@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_PRIVATE_DECL (tree @var{var}, @var{int})
+Tweak variable declaration for a private variable at the specified
+parallelism level.
 @end deftypefn
 
 @deftypevr {Target Hook} bool TARGET_GOACC_WORKER_PARTITIONING
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index c0b92f25da7..74a1b03f1e8 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4210,7 +4210,7 @@  address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_GOACC_EXPAND_ACCEL_VAR
 
-@hook TARGET_GOACC_ADJUST_GANGPRIVATE_DECL
+@hook TARGET_GOACC_ADJUST_PRIVATE_DECL
 
 @hook TARGET_GOACC_WORKER_PARTITIONING
 
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 04081f36c4d..9b5e518cc4b 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2617,6 +2617,8 @@  expand_UNIQUE (internal_fn, gcall *stmt)
       else
 	gcc_unreachable ();
       break;
+    case IFN_UNIQUE_OACC_PRIVATE:
+      break;
     }
 
   if (pattern)
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 7164ee5cf3c..a2810edc1b4 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -36,7 +36,8 @@  along with GCC; see the file COPYING3.  If not see
 #define IFN_UNIQUE_CODES				  \
   DEF(UNSPEC),	\
     DEF(OACC_FORK), DEF(OACC_JOIN),		\
-    DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK)
+    DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK),	\
+    DEF(OACC_PRIVATE)
 
 enum ifn_unique_kind {
 #define DEF(X) IFN_UNIQUE_##X
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f0d87a686fe..eddae644416 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -147,9 +147,6 @@  struct omp_context
   /* A tree_list of the reduction clauses in outer contexts.  */
   tree outer_reduction_clauses;
 
-  /* The number of levels of OpenACC partitioning invoked in this context.  */
-  unsigned oacc_partitioning_levels;
-
   /* Addressable variable decls in this context.  */
   vec<tree> *oacc_addressable_var_decls;
 };
@@ -6148,8 +6145,9 @@  lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list,
 
 static void
 lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
-		       gcall *fork, gcall *join, gimple_seq *fork_seq,
-		       gimple_seq *join_seq, omp_context *ctx)
+		       gcall *fork, gcall *private_marker, gcall *join,
+		       gimple_seq *fork_seq, gimple_seq *join_seq,
+		       omp_context *ctx)
 {
   gimple_seq before_fork = NULL;
   gimple_seq after_fork = NULL;
@@ -6351,6 +6349,8 @@  lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
   /* Now stitch things together.  */
   gimple_seq_add_seq (fork_seq, before_fork);
+  if (private_marker)
+    gimple_seq_add_stmt (fork_seq, private_marker);
   if (fork)
     gimple_seq_add_stmt (fork_seq, fork);
   gimple_seq_add_seq (fork_seq, after_fork);
@@ -7048,7 +7048,7 @@  lower_oacc_loop_marker (location_t loc, tree ddvar, bool head,
    HEAD and TAIL.  */
 
 static void
-lower_oacc_head_tail (location_t loc, tree clauses,
+lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
 		      gimple_seq *head, gimple_seq *tail, omp_context *ctx)
 {
   bool inner = false;
@@ -7056,13 +7056,19 @@  lower_oacc_head_tail (location_t loc, tree clauses,
   gimple_seq_add_stmt (head, gimple_build_assign (ddvar, integer_zero_node));
 
   unsigned count = lower_oacc_head_mark (loc, ddvar, clauses, head, ctx);
+
+  if (private_marker)
+    {
+      gimple_set_location (private_marker, loc);
+      gimple_call_set_lhs (private_marker, ddvar);
+      gimple_call_set_arg (private_marker, 1, ddvar);
+    }
+
   tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK);
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
 
-  ctx->oacc_partitioning_levels = count;
-
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -7089,7 +7095,8 @@  lower_oacc_head_tail (location_t loc, tree clauses,
 			      &join_seq);
 
       lower_oacc_reductions (loc, clauses, place, inner,
-			     fork, join, &fork_seq, &join_seq,  ctx);
+			     fork, (count == 1) ? private_marker : NULL,
+			     join, &fork_seq, &join_seq,  ctx);
 
       /* Append this level to head. */
       gimple_seq_add_seq (head, fork_seq);
@@ -8755,9 +8762,6 @@  oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
 {
   tree c;
 
-  if (!ctx)
-    return;
-
   for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
     if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
       {
@@ -8821,6 +8825,58 @@  mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
     }
 }
 
+/* Build an internal UNIQUE function with type IFN_UNIQUE_OACC_PRIVATE listing
+   the addresses of variables that should be made private at the surrounding
+   parallelism level.  Such functions appear in the gimple code stream in two
+   forms, e.g. for a partitioned loop:
+
+      .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6, 1, 68);
+      .data_dep.6 = .UNIQUE (OACC_PRIVATE, .data_dep.6, -1, &w);
+      .data_dep.6 = .UNIQUE (OACC_FORK, .data_dep.6, -1);
+      .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6);
+
+   or alternatively, OACC_PRIVATE can appear at the top level of a parallel,
+   not as part of a HEAD_MARK sequence:
+
+      .UNIQUE (OACC_PRIVATE, 0, 0, &w);
+
+   For such stand-alone appearances, the 3rd argument is always 0, denoting
+   gang partitioning.  */
+
+static gcall *
+make_oacc_private_marker (omp_context *ctx)
+{
+  int i;
+  tree decl;
+
+  if (ctx->oacc_addressable_var_decls->length () == 0)
+    return NULL;
+
+  auto_vec<tree, 5> args;
+
+  args.quick_push (build_int_cst (integer_type_node,
+				 IFN_UNIQUE_OACC_PRIVATE));
+  args.quick_push (integer_zero_node);
+  args.quick_push (integer_minus_one_node);
+
+  FOR_EACH_VEC_ELT (*ctx->oacc_addressable_var_decls, i, decl)
+    {
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+	{
+	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
+	  if (inner_decl)
+	    {
+	      decl = inner_decl;
+	      break;
+	    }
+	}
+      tree addr = build_fold_addr_expr (decl);
+      args.safe_push (addr);
+    }
+
+  return gimple_build_call_internal_vec (IFN_UNIQUE, args);
+}
+
 /* Lower code for an OMP loop directive.  */
 
 static void
@@ -8857,6 +8913,8 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       gbind *inner_bind
 	= as_a <gbind *> (gimple_seq_first_stmt (omp_for_body));
       tree vars = gimple_bind_vars (inner_bind);
+      if (is_gimple_omp_oacc (ctx->stmt))
+	oacc_record_vars_in_bind (ctx, vars);
       gimple_bind_append_vars (new_stmt, vars);
       /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't
 	 keep them on the inner_bind and it's block.  */
@@ -8953,6 +9011,12 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   lower_omp (gimple_omp_body_ptr (stmt), ctx);
 
+  gcall *private_marker = NULL;
+  if (is_gimple_omp_oacc (ctx->stmt)
+      && !gimple_seq_empty_p (omp_for_body)
+      && !gimple_seq_empty_p (omp_for_body))
+    private_marker = make_oacc_private_marker (ctx);
+
   /* Lower the header expressions.  At this point, we can assume that
      the header is of the form:
 
@@ -8989,7 +9053,7 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (is_gimple_omp_oacc (ctx->stmt)
       && !ctx_in_oacc_kernels_region (ctx))
     lower_oacc_head_tail (gimple_location (stmt),
-			  gimple_omp_for_clauses (stmt),
+			  gimple_omp_for_clauses (stmt), private_marker,
 			  &oacc_head, &oacc_tail, ctx);
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
@@ -9872,8 +9936,6 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
-  oacc_record_private_var_clauses (ctx, clauses);
-
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -10242,8 +10304,6 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
-      mark_oacc_gangprivate (ctx->oacc_addressable_var_decls, ctx);
-
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -11195,8 +11255,14 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	     them as a dummy GANG loop.  */
 	  tree level = build_int_cst (integer_type_node, GOMP_DIM_GANG);
 
+	  gcall *private_marker = make_oacc_private_marker (ctx);
+
+	  if (private_marker)
+	    gimple_call_set_arg (private_marker, 2, level);
+
 	  lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level,
-				 false, NULL, NULL, &fork_seq, &join_seq, ctx);
+				 false, NULL, private_marker, NULL, &fork_seq,
+				 &join_seq, ctx);
 	}
 
       gimple_seq_add_seq (&new_body, fork_seq);
@@ -11307,26 +11373,6 @@  lower_omp_grid_body (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		       gimple_build_omp_return (false));
 }
 
-/* Find gang-private variables in a context.  */
-
-static int
-process_oacc_gangprivate_1 (splay_tree_node node, void * /* data */)
-{
-  omp_context *ctx = (omp_context *) node->value;
-  unsigned level_total = 0;
-  omp_context *thisctx;
-
-  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
-    level_total += thisctx->oacc_partitioning_levels;
-
-  /* If the current context and parent contexts are distributed over a
-     total of one parallelism level, we have gang partitioning.  */
-  if (level_total == 1)
-    mark_oacc_gangprivate (ctx->oacc_addressable_var_decls, ctx);
-
-  return 0;
-}
-
 /* Helper to lookup dynamic array through nested omp contexts. Returns
    TREE_LIST of dimensions, and the CTX where it was found in *CTX_P.  */
 
@@ -11666,7 +11712,9 @@  lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
-      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));
+      if (ctx && is_gimple_omp_oacc (ctx->stmt))
+	oacc_record_vars_in_bind (ctx,
+				  gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
@@ -11917,7 +11965,6 @@  execute_lower_omp (void)
 
   if (all_contexts)
     {
-      splay_tree_foreach (all_contexts, process_oacc_gangprivate_1, NULL);
       splay_tree_delete (all_contexts);
       all_contexts = NULL;
     }
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index a6f64aac37e..e489ad3073a 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1110,7 +1110,9 @@  oacc_loop_xform_head_tail (gcall *from, int level)
 	    = ((enum ifn_unique_kind)
 	       TREE_INT_CST_LOW (gimple_call_arg (stmt, 0)));
 
-	  if (k == IFN_UNIQUE_OACC_FORK || k == IFN_UNIQUE_OACC_JOIN)
+	  if (k == IFN_UNIQUE_OACC_FORK
+	      || k == IFN_UNIQUE_OACC_JOIN
+	      || k == IFN_UNIQUE_OACC_PRIVATE)
 	    *gimple_call_arg_ptr (stmt, 2) = replacement;
 	  else if (k == kind && stmt != from)
 	    break;
@@ -1828,6 +1830,8 @@  execute_oacc_device_lower ()
   for (unsigned i = 0; i < GOMP_DIM_MAX; i++)
     dims[i] = oacc_get_fn_dim_size (current_function_decl, i);
 
+  hash_set<tree> adjusted_vars;
+
   /* Now lower internal loop functions to target-specific code
      sequences.  */
   basic_block bb;
@@ -1904,6 +1908,43 @@  execute_oacc_device_lower ()
 		case IFN_UNIQUE_OACC_TAIL_MARK:
 		  remove = true;
 		  break;
+
+		case IFN_UNIQUE_OACC_PRIVATE:
+		  {
+		    HOST_WIDE_INT level
+		      = TREE_INT_CST_LOW (gimple_call_arg (call, 2));
+		    if (level == -1)
+		      break;
+		    for (unsigned i = 3;
+			 i < gimple_call_num_args (call);
+			 i++)
+		      {
+			tree arg = gimple_call_arg (call, i);
+			gcc_assert (TREE_CODE (arg) == ADDR_EXPR);
+			tree decl = TREE_OPERAND (arg, 0);
+			if (dump_file && (dump_flags & TDF_DETAILS))
+			  {
+			    static char const *const axes[] =
+			      /* Must be kept in sync with GOMP_DIM
+				 enumeration.  */
+			      { "gang", "worker", "vector" };
+			    fprintf (dump_file, "Decl UID %u has %s "
+				     "partitioning:", DECL_UID (decl),
+				     axes[level]);
+			    print_generic_decl (dump_file, decl, TDF_SLIM);
+			    fputc ('\n', dump_file);
+			  }
+			if (targetm.goacc.adjust_private_decl)
+			  {
+			    tree oldtype = TREE_TYPE (decl);
+			    targetm.goacc.adjust_private_decl (decl, level);
+			    if (TREE_TYPE (decl) != oldtype)
+			      adjusted_vars.add (decl);
+			  }
+		      }
+		    remove = true;
+		  }
+		  break;
 		}
 	      break;
 	    }
@@ -1952,21 +1993,10 @@  execute_oacc_device_lower ()
      uses (2).  At least on AMD GCN, there are atomic operations that work
      directly in the LDS address space.  */
 
-  if (targetm.goacc.adjust_gangprivate_decl)
+  if (targetm.goacc.adjust_private_decl)
     {
       tree var;
       unsigned i;
-      hash_set<tree> adjusted_vars;
-
-      FOR_EACH_LOCAL_DECL (cfun, i, var)
-	{
-	  if (!VAR_P (var)
-	      || !lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
-	    continue;
-
-	  targetm.goacc.adjust_gangprivate_decl (var);
-	  adjusted_vars.add (var);
-	}
 
       FOR_ALL_BB_FN (bb, cfun)
 	for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
diff --git a/gcc/omp-sese.c b/gcc/omp-sese.c
index d7267017677..13d803fb1cd 100644
--- a/gcc/omp-sese.c
+++ b/gcc/omp-sese.c
@@ -713,19 +713,61 @@  find_partitioned_var_uses (parallel_g *par, unsigned outer_mask,
     }
 }
 
+/* Gang-private variables (typically placed in a GPU's shared memory) do not
+   need to be processed by the worker-propagation mechanism.  Populate the
+   GANGPRIVATE_VARS set with any such variables found in the current
+   function.  */
+
+static void
+find_gangprivate_vars (hash_set<tree> *gangprivate_vars)
+{
+  basic_block block;
+
+  FOR_EACH_BB_FN (block, cfun)
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (block);
+	   !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+
+	  if (gimple_call_internal_p (stmt, IFN_UNIQUE))
+	    {
+	      enum ifn_unique_kind k = ((enum ifn_unique_kind)
+		TREE_INT_CST_LOW (gimple_call_arg (stmt, 0)));
+	      if (k == IFN_UNIQUE_OACC_PRIVATE)
+		{
+		  HOST_WIDE_INT level
+		    = TREE_INT_CST_LOW (gimple_call_arg (stmt, 2));
+		  if (level != GOMP_DIM_GANG)
+		    continue;
+		  for (unsigned i = 3; i < gimple_call_num_args (stmt); i++)
+		    {
+		      tree arg = gimple_call_arg (stmt, i);
+		      gcc_assert (TREE_CODE (arg) == ADDR_EXPR);
+		      tree decl = TREE_OPERAND (arg, 0);
+		      gangprivate_vars->add (decl);
+		    }
+		}
+	    }
+	}
+    }
+}
+
 static void
 find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask,
 			      hash_set<tree> *partitioned_var_uses,
+			      hash_set<tree> *gangprivate_vars,
 			      vec<propagation_set *> *prop_set)
 {
   unsigned mask = outer_mask | par->mask;
 
   if (par->inner)
     find_local_vars_to_propagate (par->inner, mask, partitioned_var_uses,
-				  prop_set);
+				  gangprivate_vars, prop_set);
   if (par->next)
     find_local_vars_to_propagate (par->next, outer_mask, partitioned_var_uses,
-				  prop_set);
+				  gangprivate_vars, prop_set);
 
   if (!(mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
     {
@@ -747,8 +789,7 @@  find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask,
 		      || is_global_var (var)
 		      || AGGREGATE_TYPE_P (TREE_TYPE (var))
 		      || !partitioned_var_uses->contains (var)
-		      || lookup_attribute ("oacc gangprivate",
-					   DECL_ATTRIBUTES (var)))
+		      || gangprivate_vars->contains (var))
 		    continue;
 
 		  if (stmt_may_clobber_ref_p (stmt, var))
@@ -1353,9 +1394,12 @@  oacc_do_neutering (void)
 			       &prop_set);
 
   hash_set<tree> partitioned_var_uses;
+  hash_set<tree> gangprivate_vars;
 
+  find_gangprivate_vars (&gangprivate_vars);
   find_partitioned_var_uses (par, mask, &partitioned_var_uses);
-  find_local_vars_to_propagate (par, mask, &partitioned_var_uses, &prop_set);
+  find_local_vars_to_propagate (par, mask, &partitioned_var_uses,
+				&gangprivate_vars, &prop_set);
 
   FOR_ALL_BB_FN (bb, cfun)
     {
diff --git a/gcc/target.def b/gcc/target.def
index c9c3f650e8a..d4901389cbc 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1730,9 +1730,10 @@  rtx, (tree var),
 NULL)
 
 DEFHOOK
-(adjust_gangprivate_decl,
-"Tweak variable declaration for a gang-private variable.",
-void, (tree var),
+(adjust_private_decl,
+"Tweak variable declaration for a private variable at the specified\n\
+parallelism level.",
+void, (tree var, int),
 NULL)
 
 DEFHOOK
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
index 9158b6f4768..dafc70c743e 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -1,8 +1,8 @@ 
 ! Test for "oacc gangprivate" attribute on gang-private variables
 
 ! { dg-do run }
-! { dg-additional-options "-fdump-tree-omplower-details" }
-! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */
+! { dg-additional-options "-fdump-tree-oaccdevlow-details" }
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */
 
 program main
   integer :: w, arr(0:31)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
index d147229d91e..90e06be24ff 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
@@ -1,8 +1,8 @@ 
-! Test for lack of "oacc gangprivate" attribute on worker-private variables
+! Test for worker-private variables
 
 ! { dg-do run }
-! { dg-additional-options "-fdump-tree-omplower-details" }
-! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl" 0 "omplower" } } */
+! { dg-additional-options "-fdump-tree-oaccdevlow-details" }
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */
 
 program main
   integer :: w, arr(0:31)
@@ -13,7 +13,9 @@  program main
       w = 0
       !$acc loop seq
       do i = 0, 31
+        !$acc atomic update
         w = w + 1
+        !$acc end atomic
       end do
       arr(j) = w
     end do