From patchwork Wed Oct 16 19:08:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1178083 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-511150-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="DkYioE6o"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46thg35Vp0z9sNx for ; Thu, 17 Oct 2019 06:09:17 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version :content-transfer-encoding:content-type; q=dns; s=default; b=mIA 8qB0n/OMYtInfNunpXWRiLd5nXAZ+VZdsrZacMJUMcWXuqzHkSl/lgxI9YOx+dxY paDA60GfDcAZnoXya8fY6l9S1THa4I+v7MGESghNeA6no0tOgJZkT7cLS/jPg1n2 o8IiR8mQVTQOHu/xivfVtLKDxtCBs1fp1+a6BVGc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version :content-transfer-encoding:content-type; s=default; bh=24EMAdz7m 6nquccJkHUV5KjHH+E=; b=DkYioE6ophyYykHjV9oILQjRu+2Xlo35oYGpqz8ap DULawlZIoFOjq7uGKrri4dx14JuSNSwOQHmwvHP2WLaUGmu0OA8tCt7bVNBxK6Dl n3rNOtgzlRZWm4wcadBKvIrxcDNtz/0SvNbJj9F2ksjrCQNvGuIni5pRwbCbkVzT 30= Received: (qmail 126915 invoked by alias); 16 Oct 2019 19:09:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 126891 invoked by uid 89); 16 Oct 2019 19:09:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: esa4.mentor.iphmx.com Received: from esa4.mentor.iphmx.com (HELO esa4.mentor.iphmx.com) (68.232.137.252) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 Oct 2019 19:09:05 +0000 IronPort-SDR: KsOQCoPpV5VysISgdUQk8HS/2CcY8pbDAq4prcH0+xpmTgcuAxRKxUj7S14iZctfHQMAktgaZA lXfbOYO38hyRX5P9vys6J7Tu64lnoFroL1+8665PBAI0WpavlIuEwCIP0UuSU6puGTTV8X0G6V nUfHOuC9mhStyr8Emtj85pGsiD0Q1Vy4BmdIfQFc47NPdk3uHHwKK4xDl3vQHQ88HcXdTYBrC9 N/3sJWVvR3sGFnzmua1jncm/uJIf0Gerj6YIPF7F7OUFBXTwrnvbHWRQrVJMzDrhsddiHbw0r0 mL0= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 16 Oct 2019 11:09:03 -0800 IronPort-SDR: c8Xlp8iEkdXilFSiDaJ1m8Em2/aSzB37wnFGm51IQ4AwT9dqGMj048b9mH5VdBiRRR46mICrH7 wOHRlDLLUzKYDxpTmSWjyfqvP1YvGqtoVVvEKpBVGlqXl6SRnFVEWMZHrf8fnJHBoM7KEGZK+0 B78t69HXuNFc5TCifwQwG1ecaM7rYs7B1DotphlCwEsxbyaQ0OTzERj6XEE745V5GePj63ULCY 9FdPCqfRo1+Vy8iGM3Cvr5N3N8726lGU78eetziPWh3hvfdTOgzubaP/OIRcash1W475rGxdVe 76k= From: Julian Brown To: CC: Thomas Schwinge , Jakub Jelinek , Chung-Lin Tang , Tom de Vries Subject: [PATCH] [og9] Re-do OpenACC private variable resolution Date: Wed, 16 Oct 2019 12:08:49 -0700 Message-ID: <20191016190849.70347-1-julian@codesourcery.com> MIME-Version: 1.0 X-IsSubscribed: yes This patch (for the openacc-gcc-9-branch) reworks how the partitioning level for OpenACC "private" variables is calculated and represented in the compiler. There have been two previous approaches: - The first (by Chung-Lin Tang) recorded which variables should be made private per-gang in each front end (i.e. separately in C, C++ and Fortran) using a new attribute "oacc gangprivate". This was deemed too early; the final determination about which loops are assigned which parallelism level has not yet been made at parse time. - The second, last discussed here: https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html moved the analysis of OpenACC contexts to determine parallelism levels to omp-low.c (but kept the "oacc gangprivate" attribute and the NVPTX backend parts). However (as mentioned in that mail), this is still too early: in fact the final determination of the parallelism level for each loop (especially for loops without explicit gang/worker/vector clauses) does not happen until we reach the device compiler, in the oaccloops pass. This patch builds on the second approach, but delays fixing the parallelism level of each "private" variable (those that are addressable, and declared private using OpenACC clauses or by defining them in a scope nested within a compute region or partitioned loop) until the oaccdevlow pass. This is done by adding a new internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each private variable as an argument. These new internal functions fit into the existing scheme for demarking OpenACC loops, as described in comments in the patch. Use of the "oacc gangprivate" attribute is now restricted to the NVPTX backend (and could probably be replaced with some lighter-weight mechanism as a followup). Tested with offloading to NVPTX and GCN (separately). I will apply to the openacc-gcc-9-branch shortly. Thanks, Julian ChangeLog gcc/ * config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. * config/gcn/gcn-tree.c (diagnostic-core.h): Include. (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter. * config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to... (TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this. * config/nvptx/nvptx.c (tree-pretty-print.h): Include. (nvptx_goacc_adjust_private_decl): New function. (TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook using above function. * doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to... (TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this. * doc/tm.texi: Regenerated. * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE. * internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE. * omp-low.c (omp_context): Remove oacc_partitioning_levels field. (lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert before fork. (lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify its gimple call arguments as appropriate. Don't set oacc_partitioning_levels in omp_context. Pass private_marker to lower_oacc_reductions. (oacc_record_private_var_clauses): Don't check for NULL ctx. (make_oacc_private_marker): New function. (lower_omp_for): Only call oacc_record_vars_in_bind for OpenACC contexts. Create private marker and pass to lower_oacc_head_tail. (lower_omp_target): Remove unnecessary call to oacc_record_private_var_clauses. Remove call to mark_oacc_gangprivate. Create private marker and pass to lower_oacc_reductions. (process_oacc_gangprivate_1): Remove. (lower_omp_1): Only call oacc_record_vars_in_bind for OpenACC. Don't iterate over contexts calling process_oacc_gangprivate_1. (omp-offload.c (oacc_loop_xform_head_tail): Treat private-variable markers like fork/join when transforming head/tail sequences. (execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE instead of "oacc gangprivate" attributes to determine partitioning level of variables. * omp-sese.c (find_gangprivate_vars): New function. (find_local_vars_to_propagate): Use GANGPRIVATE_VARS parameter instead of "oacc gangprivate" attribute to determine which variables are gang-private. (oacc_do_neutering): Use find_gangprivate_vars. * target.def (adjust_gangprivate_decl): Rename to... (adjust_private_decl): ...this. Update documentation (briefly). libgomp/ * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Use oaccdevlow dump and update scanned output. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Likewise. Add missing atomic to force worker partitioning for test variable. --- gcc/config/gcn/gcn-protos.h | 2 +- gcc/config/gcn/gcn-tree.c | 6 +- gcc/config/gcn/gcn.c | 4 +- gcc/config/nvptx/nvptx.c | 26 ++++ gcc/doc/tm.texi | 5 +- gcc/doc/tm.texi.in | 2 +- gcc/internal-fn.c | 2 + gcc/internal-fn.h | 3 +- gcc/omp-low.c | 127 ++++++++++++------ gcc/omp-offload.c | 56 ++++++-- gcc/omp-sese.c | 54 +++++++- gcc/target.def | 7 +- .../gangprivate-attrib-1.f90 | 4 +- .../gangprivate-attrib-2.f90 | 8 +- 14 files changed, 232 insertions(+), 74 deletions(-) diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h index 1711862c6a2..e33c0598fee 100644 --- a/gcc/config/gcn/gcn-protos.h +++ b/gcc/config/gcn/gcn-protos.h @@ -39,7 +39,7 @@ extern rtx gcn_gen_undef (machine_mode); extern bool gcn_global_address_p (rtx); extern tree gcn_goacc_create_propagation_record (tree record_type, bool sender, const char *name); -extern void gcn_goacc_adjust_gangprivate_decl (tree var); +extern void gcn_goacc_adjust_private_decl (tree var, int level); extern void gcn_goacc_reduction (gcall *call); extern bool gcn_hard_regno_rename_ok (unsigned int from_reg, unsigned int to_reg); diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c index 04902a39b29..db8e290dc78 100644 --- a/gcc/config/gcn/gcn-tree.c +++ b/gcc/config/gcn/gcn-tree.c @@ -44,6 +44,7 @@ #include "cgraph.h" #include "targhooks.h" #include "langhooks-def.h" +#include "diagnostic-core.h" /* }}} */ /* {{{ OMP GCN pass. @@ -697,8 +698,11 @@ gcn_goacc_create_propagation_record (tree record_type, bool sender, } void -gcn_goacc_adjust_gangprivate_decl (tree var) +gcn_goacc_adjust_private_decl (tree var, int level) { + if (level != GOMP_DIM_GANG) + return; + tree type = TREE_TYPE (var); tree lds_type = build_qualified_type (type, TYPE_QUALS_NO_ADDR_SPACE (type) diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index e0a558b289a..2835a3d7141 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -6044,8 +6044,8 @@ print_operand (FILE *file, rtx x, int code) #undef TARGET_GOACC_CREATE_PROPAGATION_RECORD #define TARGET_GOACC_CREATE_PROPAGATION_RECORD \ gcn_goacc_create_propagation_record -#undef TARGET_GOACC_ADJUST_GANGPRIVATE_DECL -#define TARGET_GOACC_ADJUST_GANGPRIVATE_DECL gcn_goacc_adjust_gangprivate_decl +#undef TARGET_GOACC_ADJUST_PRIVATE_DECL +#define TARGET_GOACC_ADJUST_PRIVATE_DECL gcn_goacc_adjust_private_decl #undef TARGET_GOACC_FORK_JOIN #define TARGET_GOACC_FORK_JOIN gcn_fork_join #undef TARGET_GOACC_REDUCTION diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index d6b2881d110..2a41d565994 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -76,6 +76,7 @@ #include "intl.h" #include "tree-hash-traits.h" #include "omp-sese.h" +#include "tree-pretty-print.h" /* This file should be included last. */ #include "target-def.h" @@ -6019,6 +6020,28 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t) return false; } +/* Implement TARGET_GOACC_ADJUST_PRIVATE_DECL. Set "oacc gangprivate" + attribute for gang-private variable declarations. */ + +void +nvptx_goacc_adjust_private_decl (tree decl, int level) +{ + if (level != GOMP_DIM_GANG) + return; + + if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Setting 'oacc gangprivate' attribute for decl:"); + print_generic_decl (dump_file, decl, TDF_SLIM); + fputc ('\n', dump_file); + } + tree id = get_identifier ("oacc gangprivate"); + DECL_ATTRIBUTES (decl) = tree_cons (id, NULL, DECL_ATTRIBUTES (decl)); + } +} + /* Implement TARGET_GOACC_EXPAND_ACCEL_VAR. Place "oacc gangprivate" variables in shared memory. */ @@ -6201,6 +6224,9 @@ nvptx_set_current_function (tree fndecl) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_GOACC_ADJUST_PRIVATE_DECL +#define TARGET_GOACC_ADJUST_PRIVATE_DECL nvptx_goacc_adjust_private_decl + #undef TARGET_GOACC_EXPAND_ACCEL_VAR #define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 536a436b1c4..e44f8058421 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6162,8 +6162,9 @@ memories. A return value of NULL indicates that the target does not handle this VAR_DECL, and normal RTL expanding is resumed. @end deftypefn -@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_GANGPRIVATE_DECL (tree @var{var}) -Tweak variable declaration for a gang-private variable. +@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_PRIVATE_DECL (tree @var{var}, @var{int}) +Tweak variable declaration for a private variable at the specified +parallelism level. @end deftypefn @deftypevr {Target Hook} bool TARGET_GOACC_WORKER_PARTITIONING diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index c0b92f25da7..74a1b03f1e8 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4210,7 +4210,7 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_GOACC_EXPAND_ACCEL_VAR -@hook TARGET_GOACC_ADJUST_GANGPRIVATE_DECL +@hook TARGET_GOACC_ADJUST_PRIVATE_DECL @hook TARGET_GOACC_WORKER_PARTITIONING diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 04081f36c4d..9b5e518cc4b 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2617,6 +2617,8 @@ expand_UNIQUE (internal_fn, gcall *stmt) else gcc_unreachable (); break; + case IFN_UNIQUE_OACC_PRIVATE: + break; } if (pattern) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 7164ee5cf3c..a2810edc1b4 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3. If not see #define IFN_UNIQUE_CODES \ DEF(UNSPEC), \ DEF(OACC_FORK), DEF(OACC_JOIN), \ - DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK) + DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK), \ + DEF(OACC_PRIVATE) enum ifn_unique_kind { #define DEF(X) IFN_UNIQUE_##X diff --git a/gcc/omp-low.c b/gcc/omp-low.c index f0d87a686fe..eddae644416 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -147,9 +147,6 @@ struct omp_context /* A tree_list of the reduction clauses in outer contexts. */ tree outer_reduction_clauses; - /* The number of levels of OpenACC partitioning invoked in this context. */ - unsigned oacc_partitioning_levels; - /* Addressable variable decls in this context. */ vec *oacc_addressable_var_decls; }; @@ -6148,8 +6145,9 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list, static void lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, - gcall *fork, gcall *join, gimple_seq *fork_seq, - gimple_seq *join_seq, omp_context *ctx) + gcall *fork, gcall *private_marker, gcall *join, + gimple_seq *fork_seq, gimple_seq *join_seq, + omp_context *ctx) { gimple_seq before_fork = NULL; gimple_seq after_fork = NULL; @@ -6351,6 +6349,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, /* Now stitch things together. */ gimple_seq_add_seq (fork_seq, before_fork); + if (private_marker) + gimple_seq_add_stmt (fork_seq, private_marker); if (fork) gimple_seq_add_stmt (fork_seq, fork); gimple_seq_add_seq (fork_seq, after_fork); @@ -7048,7 +7048,7 @@ lower_oacc_loop_marker (location_t loc, tree ddvar, bool head, HEAD and TAIL. */ static void -lower_oacc_head_tail (location_t loc, tree clauses, +lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker, gimple_seq *head, gimple_seq *tail, omp_context *ctx) { bool inner = false; @@ -7056,13 +7056,19 @@ lower_oacc_head_tail (location_t loc, tree clauses, gimple_seq_add_stmt (head, gimple_build_assign (ddvar, integer_zero_node)); unsigned count = lower_oacc_head_mark (loc, ddvar, clauses, head, ctx); + + if (private_marker) + { + gimple_set_location (private_marker, loc); + gimple_call_set_lhs (private_marker, ddvar); + gimple_call_set_arg (private_marker, 1, ddvar); + } + tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK); tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN); gcc_assert (count); - ctx->oacc_partitioning_levels = count; - for (unsigned done = 1; count; count--, done++) { gimple_seq fork_seq = NULL; @@ -7089,7 +7095,8 @@ lower_oacc_head_tail (location_t loc, tree clauses, &join_seq); lower_oacc_reductions (loc, clauses, place, inner, - fork, join, &fork_seq, &join_seq, ctx); + fork, (count == 1) ? private_marker : NULL, + join, &fork_seq, &join_seq, ctx); /* Append this level to head. */ gimple_seq_add_seq (head, fork_seq); @@ -8755,9 +8762,6 @@ oacc_record_private_var_clauses (omp_context *ctx, tree clauses) { tree c; - if (!ctx) - return; - for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE) { @@ -8821,6 +8825,58 @@ mark_oacc_gangprivate (vec *decls, omp_context *ctx) } } +/* Build an internal UNIQUE function with type IFN_UNIQUE_OACC_PRIVATE listing + the addresses of variables that should be made private at the surrounding + parallelism level. Such functions appear in the gimple code stream in two + forms, e.g. for a partitioned loop: + + .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6, 1, 68); + .data_dep.6 = .UNIQUE (OACC_PRIVATE, .data_dep.6, -1, &w); + .data_dep.6 = .UNIQUE (OACC_FORK, .data_dep.6, -1); + .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6); + + or alternatively, OACC_PRIVATE can appear at the top level of a parallel, + not as part of a HEAD_MARK sequence: + + .UNIQUE (OACC_PRIVATE, 0, 0, &w); + + For such stand-alone appearances, the 3rd argument is always 0, denoting + gang partitioning. */ + +static gcall * +make_oacc_private_marker (omp_context *ctx) +{ + int i; + tree decl; + + if (ctx->oacc_addressable_var_decls->length () == 0) + return NULL; + + auto_vec args; + + args.quick_push (build_int_cst (integer_type_node, + IFN_UNIQUE_OACC_PRIVATE)); + args.quick_push (integer_zero_node); + args.quick_push (integer_minus_one_node); + + FOR_EACH_VEC_ELT (*ctx->oacc_addressable_var_decls, i, decl) + { + for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer) + { + tree inner_decl = maybe_lookup_decl (decl, thisctx); + if (inner_decl) + { + decl = inner_decl; + break; + } + } + tree addr = build_fold_addr_expr (decl); + args.safe_push (addr); + } + + return gimple_build_call_internal_vec (IFN_UNIQUE, args); +} + /* Lower code for an OMP loop directive. */ static void @@ -8857,6 +8913,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) gbind *inner_bind = as_a (gimple_seq_first_stmt (omp_for_body)); tree vars = gimple_bind_vars (inner_bind); + if (is_gimple_omp_oacc (ctx->stmt)) + oacc_record_vars_in_bind (ctx, vars); gimple_bind_append_vars (new_stmt, vars); /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't keep them on the inner_bind and it's block. */ @@ -8953,6 +9011,12 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp (gimple_omp_body_ptr (stmt), ctx); + gcall *private_marker = NULL; + if (is_gimple_omp_oacc (ctx->stmt) + && !gimple_seq_empty_p (omp_for_body) + && !gimple_seq_empty_p (omp_for_body)) + private_marker = make_oacc_private_marker (ctx); + /* Lower the header expressions. At this point, we can assume that the header is of the form: @@ -8989,7 +9053,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) if (is_gimple_omp_oacc (ctx->stmt) && !ctx_in_oacc_kernels_region (ctx)) lower_oacc_head_tail (gimple_location (stmt), - gimple_omp_for_clauses (stmt), + gimple_omp_for_clauses (stmt), private_marker, &oacc_head, &oacc_tail, ctx); /* Add OpenACC partitioning and reduction markers just before the loop. */ @@ -9872,8 +9936,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) clauses = gimple_omp_target_clauses (stmt); - oacc_record_private_var_clauses (ctx, clauses); - gimple_seq dep_ilist = NULL; gimple_seq dep_olist = NULL; if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND)) @@ -10242,8 +10304,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) if (offloaded) { - mark_oacc_gangprivate (ctx->oacc_addressable_var_decls, ctx); - /* Declare all the variables created by mapping and the variables declared in the scope of the target body. */ record_vars_into (ctx->block_vars, child_fn); @@ -11195,8 +11255,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) them as a dummy GANG loop. */ tree level = build_int_cst (integer_type_node, GOMP_DIM_GANG); + gcall *private_marker = make_oacc_private_marker (ctx); + + if (private_marker) + gimple_call_set_arg (private_marker, 2, level); + lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level, - false, NULL, NULL, &fork_seq, &join_seq, ctx); + false, NULL, private_marker, NULL, &fork_seq, + &join_seq, ctx); } gimple_seq_add_seq (&new_body, fork_seq); @@ -11307,26 +11373,6 @@ lower_omp_grid_body (gimple_stmt_iterator *gsi_p, omp_context *ctx) gimple_build_omp_return (false)); } -/* Find gang-private variables in a context. */ - -static int -process_oacc_gangprivate_1 (splay_tree_node node, void * /* data */) -{ - omp_context *ctx = (omp_context *) node->value; - unsigned level_total = 0; - omp_context *thisctx; - - for (thisctx = ctx; thisctx; thisctx = thisctx->outer) - level_total += thisctx->oacc_partitioning_levels; - - /* If the current context and parent contexts are distributed over a - total of one parallelism level, we have gang partitioning. */ - if (level_total == 1) - mark_oacc_gangprivate (ctx->oacc_addressable_var_decls, ctx); - - return 0; -} - /* Helper to lookup dynamic array through nested omp contexts. Returns TREE_LIST of dimensions, and the CTX where it was found in *CTX_P. */ @@ -11666,7 +11712,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) ctx); break; case GIMPLE_BIND: - oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a (stmt))); + if (ctx && is_gimple_omp_oacc (ctx->stmt)) + oacc_record_vars_in_bind (ctx, + gimple_bind_vars (as_a (stmt))); lower_omp (gimple_bind_body_ptr (as_a (stmt)), ctx); maybe_remove_omp_member_access_dummy_vars (as_a (stmt)); break; @@ -11917,7 +11965,6 @@ execute_lower_omp (void) if (all_contexts) { - splay_tree_foreach (all_contexts, process_oacc_gangprivate_1, NULL); splay_tree_delete (all_contexts); all_contexts = NULL; } diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index a6f64aac37e..e489ad3073a 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -1110,7 +1110,9 @@ oacc_loop_xform_head_tail (gcall *from, int level) = ((enum ifn_unique_kind) TREE_INT_CST_LOW (gimple_call_arg (stmt, 0))); - if (k == IFN_UNIQUE_OACC_FORK || k == IFN_UNIQUE_OACC_JOIN) + if (k == IFN_UNIQUE_OACC_FORK + || k == IFN_UNIQUE_OACC_JOIN + || k == IFN_UNIQUE_OACC_PRIVATE) *gimple_call_arg_ptr (stmt, 2) = replacement; else if (k == kind && stmt != from) break; @@ -1828,6 +1830,8 @@ execute_oacc_device_lower () for (unsigned i = 0; i < GOMP_DIM_MAX; i++) dims[i] = oacc_get_fn_dim_size (current_function_decl, i); + hash_set adjusted_vars; + /* Now lower internal loop functions to target-specific code sequences. */ basic_block bb; @@ -1904,6 +1908,43 @@ execute_oacc_device_lower () case IFN_UNIQUE_OACC_TAIL_MARK: remove = true; break; + + case IFN_UNIQUE_OACC_PRIVATE: + { + HOST_WIDE_INT level + = TREE_INT_CST_LOW (gimple_call_arg (call, 2)); + if (level == -1) + break; + for (unsigned i = 3; + i < gimple_call_num_args (call); + i++) + { + tree arg = gimple_call_arg (call, i); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR); + tree decl = TREE_OPERAND (arg, 0); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + static char const *const axes[] = + /* Must be kept in sync with GOMP_DIM + enumeration. */ + { "gang", "worker", "vector" }; + fprintf (dump_file, "Decl UID %u has %s " + "partitioning:", DECL_UID (decl), + axes[level]); + print_generic_decl (dump_file, decl, TDF_SLIM); + fputc ('\n', dump_file); + } + if (targetm.goacc.adjust_private_decl) + { + tree oldtype = TREE_TYPE (decl); + targetm.goacc.adjust_private_decl (decl, level); + if (TREE_TYPE (decl) != oldtype) + adjusted_vars.add (decl); + } + } + remove = true; + } + break; } break; } @@ -1952,21 +1993,10 @@ execute_oacc_device_lower () uses (2). At least on AMD GCN, there are atomic operations that work directly in the LDS address space. */ - if (targetm.goacc.adjust_gangprivate_decl) + if (targetm.goacc.adjust_private_decl) { tree var; unsigned i; - hash_set adjusted_vars; - - FOR_EACH_LOCAL_DECL (cfun, i, var) - { - if (!VAR_P (var) - || !lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var))) - continue; - - targetm.goacc.adjust_gangprivate_decl (var); - adjusted_vars.add (var); - } FOR_ALL_BB_FN (bb, cfun) for (gimple_stmt_iterator gsi = gsi_start_bb (bb); diff --git a/gcc/omp-sese.c b/gcc/omp-sese.c index d7267017677..13d803fb1cd 100644 --- a/gcc/omp-sese.c +++ b/gcc/omp-sese.c @@ -713,19 +713,61 @@ find_partitioned_var_uses (parallel_g *par, unsigned outer_mask, } } +/* Gang-private variables (typically placed in a GPU's shared memory) do not + need to be processed by the worker-propagation mechanism. Populate the + GANGPRIVATE_VARS set with any such variables found in the current + function. */ + +static void +find_gangprivate_vars (hash_set *gangprivate_vars) +{ + basic_block block; + + FOR_EACH_BB_FN (block, cfun) + { + for (gimple_stmt_iterator gsi = gsi_start_bb (block); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (gimple_call_internal_p (stmt, IFN_UNIQUE)) + { + enum ifn_unique_kind k = ((enum ifn_unique_kind) + TREE_INT_CST_LOW (gimple_call_arg (stmt, 0))); + if (k == IFN_UNIQUE_OACC_PRIVATE) + { + HOST_WIDE_INT level + = TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); + if (level != GOMP_DIM_GANG) + continue; + for (unsigned i = 3; i < gimple_call_num_args (stmt); i++) + { + tree arg = gimple_call_arg (stmt, i); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR); + tree decl = TREE_OPERAND (arg, 0); + gangprivate_vars->add (decl); + } + } + } + } + } +} + static void find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask, hash_set *partitioned_var_uses, + hash_set *gangprivate_vars, vec *prop_set) { unsigned mask = outer_mask | par->mask; if (par->inner) find_local_vars_to_propagate (par->inner, mask, partitioned_var_uses, - prop_set); + gangprivate_vars, prop_set); if (par->next) find_local_vars_to_propagate (par->next, outer_mask, partitioned_var_uses, - prop_set); + gangprivate_vars, prop_set); if (!(mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))) { @@ -747,8 +789,7 @@ find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask, || is_global_var (var) || AGGREGATE_TYPE_P (TREE_TYPE (var)) || !partitioned_var_uses->contains (var) - || lookup_attribute ("oacc gangprivate", - DECL_ATTRIBUTES (var))) + || gangprivate_vars->contains (var)) continue; if (stmt_may_clobber_ref_p (stmt, var)) @@ -1353,9 +1394,12 @@ oacc_do_neutering (void) &prop_set); hash_set partitioned_var_uses; + hash_set gangprivate_vars; + find_gangprivate_vars (&gangprivate_vars); find_partitioned_var_uses (par, mask, &partitioned_var_uses); - find_local_vars_to_propagate (par, mask, &partitioned_var_uses, &prop_set); + find_local_vars_to_propagate (par, mask, &partitioned_var_uses, + &gangprivate_vars, &prop_set); FOR_ALL_BB_FN (bb, cfun) { diff --git a/gcc/target.def b/gcc/target.def index c9c3f650e8a..d4901389cbc 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1730,9 +1730,10 @@ rtx, (tree var), NULL) DEFHOOK -(adjust_gangprivate_decl, -"Tweak variable declaration for a gang-private variable.", -void, (tree var), +(adjust_private_decl, +"Tweak variable declaration for a private variable at the specified\n\ +parallelism level.", +void, (tree var, int), NULL) DEFHOOK diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 index 9158b6f4768..dafc70c743e 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 @@ -1,8 +1,8 @@ ! Test for "oacc gangprivate" attribute on gang-private variables ! { dg-do run } -! { dg-additional-options "-fdump-tree-omplower-details" } -! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl: integer\\(kind=4\\) w;" 1 "omplower" } } */ +! { dg-additional-options "-fdump-tree-oaccdevlow-details" } +! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning: integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */ program main integer :: w, arr(0:31) diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 index d147229d91e..90e06be24ff 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 @@ -1,8 +1,8 @@ -! Test for lack of "oacc gangprivate" attribute on worker-private variables +! Test for worker-private variables ! { dg-do run } -! { dg-additional-options "-fdump-tree-omplower-details" } -! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl" 0 "omplower" } } */ +! { dg-additional-options "-fdump-tree-oaccdevlow-details" } +! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning: integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */ program main integer :: w, arr(0:31) @@ -13,7 +13,9 @@ program main w = 0 !$acc loop seq do i = 0, 31 + !$acc atomic update w = w + 1 + !$acc end atomic end do arr(j) = w end do