From patchwork Fri Feb 26 12:34:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1444918 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Dn8J62wWYz9sCD for ; Fri, 26 Feb 2021 23:35:18 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A8B303973103; Fri, 26 Feb 2021 12:35:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id C24D63950434 for ; Fri, 26 Feb 2021 12:35:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C24D63950434 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Julian_Brown@mentor.com IronPort-SDR: okT0l7ZYA464X8GL+Ddn0cWkf11MJ49XEofpY6cw6qgKOr0cU4Oi0elsvrQwtf2s1clb7fQqW3 Jwv1Y6lbJgXpfwZDg49o7BbmWSwyoT4pWsJ3mkh+u33tvyNSllIUPePAJ/Lgc4wmRakuDvfB98 Uw+D3heUJRgrKI/qiM5NII9fxseuzllUT9Y6WoBwYJO/cOnC7oBpIly1S+KrnYAgoNpuRap3Jl lhx07/tjlAIBNmIftoNr4ZjNUI5YgnVvHdGeTxVbqH3IzbF8mnr4jDhYmjiIusGwpl0h/sOSry MFQ= X-IronPort-AV: E=Sophos;i="5.81,208,1610438400"; d="scan'208";a="58524563" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 26 Feb 2021 04:35:06 -0800 IronPort-SDR: EHDhx5FL015tKkqKwos4/jNuwfwPEOdXIes7lKEzU1KnWGz8p2ruPb/Jc3XrI5CG9HNhSLqsYT ap+HBw/ZiIQalmR/RTkI02ZR9G/wHOWD/3yA+s66o84qZgZz0EychuewUBD4rGF56iBhkzJLLx ik9J9ZOnxx2YHrXfC11zYTphd9NKQL2EwWQ6tqM7UHgDgy+0YoazmgmRbl7H/CWEfY4XOPtbub Qq46clYiTXnOFjXPz4BR6rT8qAMUyvx7IWw1jxRZXX1n/n6uzkuqjzEQmE7zo6WkGPHm0xO95l TQA= From: Julian Brown To: Subject: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory Date: Fri, 26 Feb 2021 04:34:50 -0800 Message-ID: X-Mailer: git-send-email 2.29.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, Thomas Schwinge Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch implements a method to track the "private-ness" of OpenACC variables declared in offload regions in gang-partitioned, worker-partitioned or vector-partitioned modes. Variables declared implicitly in scoped blocks and those declared "private" on enclosing directives (e.g. "acc parallel") are both handled. Variables that are e.g. gang-private can then be adjusted so they reside in GPU shared memory. The reason for doing this is twofold: correct implementation of OpenACC semantics, and optimisation, since shared memory might be faster than the main memory on a GPU. Handling of private variables is intimately tied to the execution model for gangs/workers/vectors implemented by a particular target: for current targets, we use (or on mainline, will soon use) a broadcasting/neutering scheme. That is sufficient for code that e.g. sets a variable in worker-single mode and expects to use the value in worker-partitioned mode. The difficulty (semantics-wise) comes when the user wants to do something like an atomic operation in worker-partitioned mode and expects a worker-single (gang private) variable to be shared across each partitioned worker. Forcing use of shared memory for such variables makes that work properly. In terms of implementation, the parallelism level of a given loop is not fixed until the oaccdevlow pass in the offload compiler, so the patch delays fixing the parallelism level of variables declared on or within such loops until the same point. This is done by adding a new internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each private variable as an argument, and other arguments set so as to be able to determine the correct parallelism level to use for the listed variables. This new internal function fits into the existing scheme for demarcating OpenACC loops, as described in comments in the patch. Two new target hooks are introduced: TARGET_GOACC_ADJUST_PRIVATE_DECL and TARGET_GOACC_EXPAND_VAR_DECL. The first can tweak a variable declaration at oaccdevlow time, and the second at expand time. The first or both of these target hooks can be used by a given offload target, depending on its strategy for implementing private variables. Tested with offloading to AMD GCN and (separately) to NVPTX. OK (for stage 1)? Thanks, Julian 2021-02-22 Julian Brown Chung-Lin Tang gcc/ * doc/tm.texi.in (TARGET_GOACC_EXPAND_VAR_DECL, TARGET_GOACC_ADJUST_PRIVATE_DECL): Add documentation hooks. * doc/tm.texi: Regenerate. * expr.c (expand_expr_real_1): Expand decls using the expand_var_decl OpenACC hook if defined. * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE. * internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE. * omp-low.c (omp_context): Add oacc_addressable_var_decls field. (lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert before fork. (lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify private marker's gimple call arguments, and pass it to lower_oacc_reductions. (oacc_record_private_var_clauses, oacc_record_vars_in_bind, make_oacc_private_marker): New functions. (lower_omp_for): Call oacc_record_private_var_clauses with "for" clauses. Call oacc_record_vars_in_bind for OpenACC contexts. Create private marker and pass to lower_oacc_head_tail. (lower_omp_target): Create private marker and pass to lower_oacc_reductions. (lower_omp_1): Call oacc_record_vars_in_bind for OpenACC. * omp-offload.c (convert.h): Include. (oacc_loop_xform_head_tail): Treat private-variable markers like fork/join when transforming head/tail sequences. (struct addr_expr_rewrite_info): Add struct. (rewrite_addr_expr): New function. (is_sync_builtin_call): New function. (execute_oacc_device_lower): Support rewriting gang-private variables using target hook, and fix up addr_expr and var_decl nodes afterwards. * target.def (expand_accel_var, adjust_private_decl): New hooks. libgomp/ * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Likewise. --- gcc/doc/tm.texi | 26 ++ gcc/doc/tm.texi.in | 4 + gcc/expr.c | 13 +- gcc/internal-fn.c | 2 + gcc/internal-fn.h | 3 +- gcc/omp-low.c | 122 +++++++++- gcc/omp-offload.c | 225 +++++++++++++++++- gcc/target.def | 30 +++ .../gang-private-1.c | 38 +++ .../libgomp.oacc-c-c++-common/loop-gwv-2.c | 95 ++++++++ .../gangprivate-attrib-1.f90 | 25 ++ .../gangprivate-attrib-2.f90 | 25 ++ 12 files changed, 599 insertions(+), 9 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 062785af1e2..94927ea7b2b 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6227,6 +6227,32 @@ like @code{cond_add@var{m}}. The default implementation returns a zero constant of type @var{type}. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_VAR_DECL (tree @var{var}) +This hook, if defined, is used by accelerator target back-ends to expand +specially handled kinds of @code{VAR_DECL} expressions. A particular use is +to place variables with specific attributes inside special accelarator +memories. A return value of @code{NULL} indicates that the target does not +handle this @code{VAR_DECL}, and normal RTL expanding is resumed. + +Only define this hook if your accelerator target needs to expand certain +@code{VAR_DECL} nodes in a way that differs from the default. You can also adjust +private variables at OpenACC device-lowering time using the +@code{TARGET_GOACC_ADJUST_PRIVATE_DECL} target hook. +@end deftypefn + +@deftypefn {Target Hook} tree TARGET_GOACC_ADJUST_PRIVATE_DECL (tree @var{var}, int @var{level}) +This hook, if defined, is used by accelerator target back-ends to adjust +OpenACC variable declarations that should be made private to the given +parallelism level (i.e. @code{GOMP_DIM_GANG}, @code{GOMP_DIM_WORKER} or +@code{GOMP_DIM_VECTOR}). A typical use for this hook is to force variable +declarations at the @code{gang} level to reside in GPU shared memory, by +setting the address space of the decl and making it static. + +You may also use the @code{TARGET_GOACC_EXPAND_VAR_DECL} hook if the +adjusted variable declaration needs to be expanded to RTL in a non-standard +way. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 3b19e6f4281..b8c23cf6db5 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4219,6 +4219,10 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_PREFERRED_ELSE_VALUE +@hook TARGET_GOACC_EXPAND_VAR_DECL + +@hook TARGET_GOACC_ADJUST_PRIVATE_DECL + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/expr.c b/gcc/expr.c index 86dc1b6c973..349825cf286 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -10224,8 +10224,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, exp = SSA_NAME_VAR (ssa_name); goto expand_decl_rtl; - case PARM_DECL: case VAR_DECL: + /* Allow accel compiler to handle variables that require special + treatment, e.g. if they have been modified in some way earlier in + compilation by the adjust_private_decl OpenACC hook. */ + if (flag_openacc && targetm.goacc.expand_var_decl) + { + temp = targetm.goacc.expand_var_decl (exp); + if (temp) + return temp; + } + /* ... fall through ... */ + + case PARM_DECL: /* If a static var's type was incomplete when the decl was written, but the type is complete now, lay out the decl now. */ if (DECL_SIZE (exp) == 0 diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index dd7173126fb..e6611e8572f 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2957,6 +2957,8 @@ expand_UNIQUE (internal_fn, gcall *stmt) else gcc_unreachable (); break; + case IFN_UNIQUE_OACC_PRIVATE: + break; } if (pattern) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index c6599ce4894..9004840e0f5 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3. If not see #define IFN_UNIQUE_CODES \ DEF(UNSPEC), \ DEF(OACC_FORK), DEF(OACC_JOIN), \ - DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK) + DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK), \ + DEF(OACC_PRIVATE) enum ifn_unique_kind { #define DEF(X) IFN_UNIQUE_##X diff --git a/gcc/omp-low.c b/gcc/omp-low.c index df5b6cec586..fd8025e0e3f 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -171,6 +171,9 @@ struct omp_context /* True if there is bind clause on the construct (i.e. a loop construct). */ bool loop_p; + + /* Addressable variable decls in this context. */ + vec oacc_addressable_var_decls; }; static splay_tree all_contexts; @@ -7048,8 +7051,9 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *body_p, static void lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, - gcall *fork, gcall *join, gimple_seq *fork_seq, - gimple_seq *join_seq, omp_context *ctx) + gcall *fork, gcall *private_marker, gcall *join, + gimple_seq *fork_seq, gimple_seq *join_seq, + omp_context *ctx) { gimple_seq before_fork = NULL; gimple_seq after_fork = NULL; @@ -7253,6 +7257,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, /* Now stitch things together. */ gimple_seq_add_seq (fork_seq, before_fork); + if (private_marker) + gimple_seq_add_stmt (fork_seq, private_marker); if (fork) gimple_seq_add_stmt (fork_seq, fork); gimple_seq_add_seq (fork_seq, after_fork); @@ -7989,7 +7995,7 @@ lower_oacc_loop_marker (location_t loc, tree ddvar, bool head, HEAD and TAIL. */ static void -lower_oacc_head_tail (location_t loc, tree clauses, +lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker, gimple_seq *head, gimple_seq *tail, omp_context *ctx) { bool inner = false; @@ -7997,6 +8003,14 @@ lower_oacc_head_tail (location_t loc, tree clauses, gimple_seq_add_stmt (head, gimple_build_assign (ddvar, integer_zero_node)); unsigned count = lower_oacc_head_mark (loc, ddvar, clauses, head, ctx); + + if (private_marker) + { + gimple_set_location (private_marker, loc); + gimple_call_set_lhs (private_marker, ddvar); + gimple_call_set_arg (private_marker, 1, ddvar); + } + tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK); tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN); @@ -8027,7 +8041,8 @@ lower_oacc_head_tail (location_t loc, tree clauses, &join_seq); lower_oacc_reductions (loc, clauses, place, inner, - fork, join, &fork_seq, &join_seq, ctx); + fork, (count == 1) ? private_marker : NULL, + join, &fork_seq, &join_seq, ctx); /* Append this level to head. */ gimple_seq_add_seq (head, fork_seq); @@ -9992,6 +10007,32 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p, } } +/* Record vars listed in private clauses in CLAUSES in CTX. This information + is used to mark up variables that should be made private per-gang. */ + +static void +oacc_record_private_var_clauses (omp_context *ctx, tree clauses) +{ + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE) + { + tree decl = OMP_CLAUSE_DECL (c); + if (VAR_P (decl) && TREE_ADDRESSABLE (decl)) + ctx->oacc_addressable_var_decls.safe_push (decl); + } +} + +/* Record addressable vars declared in BINDVARS in CTX. This information is + used to mark up variables that should be made private per-gang. */ + +static void +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars) +{ + for (tree v = bindvars; v; v = DECL_CHAIN (v)) + if (VAR_P (v) && TREE_ADDRESSABLE (v)) + ctx->oacc_addressable_var_decls.safe_push (v); +} + /* Callback for walk_gimple_seq. Find #pragma omp scan statement. */ static tree @@ -10821,6 +10862,57 @@ lower_omp_for_scan (gimple_seq *body_p, gimple_seq *dlist, gomp_for *stmt, *dlist = new_dlist; } +/* Build an internal UNIQUE function with type IFN_UNIQUE_OACC_PRIVATE listing + the addresses of variables that should be made private at the surrounding + parallelism level. Such functions appear in the gimple code stream in two + forms, e.g. for a partitioned loop: + + .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6, 1, 68); + .data_dep.6 = .UNIQUE (OACC_PRIVATE, .data_dep.6, -1, &w); + .data_dep.6 = .UNIQUE (OACC_FORK, .data_dep.6, -1); + .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6); + + or alternatively, OACC_PRIVATE can appear at the top level of a parallel, + not as part of a HEAD_MARK sequence: + + .UNIQUE (OACC_PRIVATE, 0, 0, &w); + + For such stand-alone appearances, the 3rd argument is always 0, denoting + gang partitioning. */ + +static gcall * +make_oacc_private_marker (omp_context *ctx) +{ + int i; + tree decl; + + if (ctx->oacc_addressable_var_decls.length () == 0) + return NULL; + + auto_vec args; + + args.quick_push (build_int_cst (integer_type_node, IFN_UNIQUE_OACC_PRIVATE)); + args.quick_push (integer_zero_node); + args.quick_push (integer_minus_one_node); + + FOR_EACH_VEC_ELT (ctx->oacc_addressable_var_decls, i, decl) + { + for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer) + { + tree inner_decl = maybe_lookup_decl (decl, thisctx); + if (inner_decl) + { + decl = inner_decl; + break; + } + } + tree addr = build_fold_addr_expr (decl); + args.safe_push (addr); + } + + return gimple_build_call_internal_vec (IFN_UNIQUE, args); +} + /* Lower code for an OMP loop directive. */ static void @@ -10837,6 +10929,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) push_gimplify_context (); + oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt)); + lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx); block = make_node (BLOCK); @@ -10855,6 +10949,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) gbind *inner_bind = as_a (gimple_seq_first_stmt (omp_for_body)); tree vars = gimple_bind_vars (inner_bind); + if (is_gimple_omp_oacc (ctx->stmt)) + oacc_record_vars_in_bind (ctx, vars); gimple_bind_append_vars (new_stmt, vars); /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't keep them on the inner_bind and it's block. */ @@ -10968,6 +11064,11 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp (gimple_omp_body_ptr (stmt), ctx); + gcall *private_marker = NULL; + if (is_gimple_omp_oacc (ctx->stmt) + && !gimple_seq_empty_p (omp_for_body)) + private_marker = make_oacc_private_marker (ctx); + /* Lower the header expressions. At this point, we can assume that the header is of the form: @@ -11022,7 +11123,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) if (is_gimple_omp_oacc (ctx->stmt) && !ctx_in_oacc_kernels_region (ctx)) lower_oacc_head_tail (gimple_location (stmt), - gimple_omp_for_clauses (stmt), + gimple_omp_for_clauses (stmt), private_marker, &oacc_head, &oacc_tail, ctx); /* Add OpenACC partitioning and reduction markers just before the loop. */ @@ -13019,8 +13120,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) them as a dummy GANG loop. */ tree level = build_int_cst (integer_type_node, GOMP_DIM_GANG); + gcall *private_marker = make_oacc_private_marker (ctx); + + if (private_marker) + gimple_call_set_arg (private_marker, 2, level); + lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level, - false, NULL, NULL, &fork_seq, &join_seq, ctx); + false, NULL, private_marker, NULL, &fork_seq, + &join_seq, ctx); } gimple_seq_add_seq (&new_body, fork_seq); @@ -13262,6 +13369,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) ctx); break; case GIMPLE_BIND: + if (ctx && is_gimple_omp_oacc (ctx->stmt)) + oacc_record_vars_in_bind (ctx, + gimple_bind_vars (as_a (stmt))); lower_omp (gimple_bind_body_ptr (as_a (stmt)), ctx); maybe_remove_omp_member_access_dummy_vars (as_a (stmt)); break; diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 57be342da97..b3f543b597a 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "attribs.h" #include "cfgloop.h" #include "context.h" +#include "convert.h" /* Describe the OpenACC looping structure of a function. The entire function is held in a 'NULL' loop. */ @@ -1356,7 +1357,9 @@ oacc_loop_xform_head_tail (gcall *from, int level) = ((enum ifn_unique_kind) TREE_INT_CST_LOW (gimple_call_arg (stmt, 0))); - if (k == IFN_UNIQUE_OACC_FORK || k == IFN_UNIQUE_OACC_JOIN) + if (k == IFN_UNIQUE_OACC_FORK + || k == IFN_UNIQUE_OACC_JOIN + || k == IFN_UNIQUE_OACC_PRIVATE) *gimple_call_arg_ptr (stmt, 2) = replacement; else if (k == kind && stmt != from) break; @@ -1773,6 +1776,136 @@ default_goacc_reduction (gcall *call) gsi_replace_with_seq (&gsi, seq, true); } +struct var_decl_rewrite_info +{ + gimple *stmt; + hash_map *adjusted_vars; + bool avoid_pointer_conversion; + bool modified; +}; + +/* Helper function for execute_oacc_device_lower. Rewrite VAR_DECLs (by + themselves or wrapped in various other nodes) according to ADJUSTED_VARS in + the var_decl_rewrite_info pointed to via DATA. Used as part of coercing + gang-private variables in OpenACC offload regions to reside in GPU shared + memory. */ + +static tree +oacc_rewrite_var_decl (tree *tp, int *walk_subtrees, void *data) +{ + walk_stmt_info *wi = (walk_stmt_info *) data; + var_decl_rewrite_info *info = (var_decl_rewrite_info *) wi->info; + + if (TREE_CODE (*tp) == ADDR_EXPR) + { + tree arg = TREE_OPERAND (*tp, 0); + tree *new_arg = info->adjusted_vars->get (arg); + + if (new_arg) + { + if (info->avoid_pointer_conversion) + { + *tp = build_fold_addr_expr (*new_arg); + info->modified = true; + *walk_subtrees = 0; + } + else + { + gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt); + tree repl = build_fold_addr_expr (*new_arg); + gimple *stmt1 + = gimple_build_assign (make_ssa_name (TREE_TYPE (repl)), repl); + tree conv = convert_to_pointer (TREE_TYPE (*tp), + gimple_assign_lhs (stmt1)); + gimple *stmt2 + = gimple_build_assign (make_ssa_name (TREE_TYPE (*tp)), conv); + gsi_insert_before (&gsi, stmt1, GSI_SAME_STMT); + gsi_insert_before (&gsi, stmt2, GSI_SAME_STMT); + *tp = gimple_assign_lhs (stmt2); + info->modified = true; + *walk_subtrees = 0; + } + } + } + else if (TREE_CODE (*tp) == COMPONENT_REF || TREE_CODE (*tp) == ARRAY_REF) + { + tree *base = &TREE_OPERAND (*tp, 0); + + while (TREE_CODE (*base) == COMPONENT_REF + || TREE_CODE (*base) == ARRAY_REF) + base = &TREE_OPERAND (*base, 0); + + if (TREE_CODE (*base) != VAR_DECL) + return NULL; + + tree *new_decl = info->adjusted_vars->get (*base); + if (!new_decl) + return NULL; + + int base_quals = TYPE_QUALS (TREE_TYPE (*new_decl)); + tree field = TREE_OPERAND (*tp, 1); + + /* Adjust the type of the field. */ + int field_quals = TYPE_QUALS (TREE_TYPE (field)); + if (TREE_CODE (field) == FIELD_DECL && field_quals != base_quals) + { + tree *field_type = &TREE_TYPE (field); + while (TREE_CODE (*field_type) == ARRAY_TYPE) + field_type = &TREE_TYPE (*field_type); + field_quals |= base_quals; + *field_type = build_qualified_type (*field_type, field_quals); + } + + /* Adjust the type of the component ref itself. */ + tree comp_type = TREE_TYPE (*tp); + int comp_quals = TYPE_QUALS (comp_type); + if (TREE_CODE (*tp) == COMPONENT_REF && comp_quals != base_quals) + { + comp_quals |= base_quals; + TREE_TYPE (*tp) + = build_qualified_type (comp_type, comp_quals); + } + + *base = *new_decl; + info->modified = true; + } + else if (TREE_CODE (*tp) == VAR_DECL) + { + tree *new_decl = info->adjusted_vars->get (*tp); + if (new_decl) + { + *tp = *new_decl; + info->modified = true; + } + } + + return NULL_TREE; +} + +/* Return TRUE if CALL is a call to a builtin atomic/sync operation. */ + +static bool +is_sync_builtin_call (gcall *call) +{ + tree callee = gimple_call_fndecl (call); + + if (callee != NULL_TREE + && gimple_call_builtin_p (call, BUILT_IN_NORMAL)) + switch (DECL_FUNCTION_CODE (callee)) + { +#undef DEF_SYNC_BUILTIN +#define DEF_SYNC_BUILTIN(ENUM, NAME, TYPE, ATTRS) case ENUM: +#include "sync-builtins.def" +#undef DEF_SYNC_BUILTIN + return true; + + default: + ; + } + + return false; +} + /* Main entry point for oacc transformations which run on the device compiler after LTO, so we know what the target device is at this point (including the host fallback). */ @@ -1922,6 +2055,8 @@ execute_oacc_device_lower () dominance information to update SSA. */ calculate_dominance_info (CDI_DOMINATORS); + hash_map adjusted_vars; + /* Now lower internal loop functions to target-specific code sequences. */ basic_block bb; @@ -1998,6 +2133,45 @@ execute_oacc_device_lower () case IFN_UNIQUE_OACC_TAIL_MARK: remove = true; break; + + case IFN_UNIQUE_OACC_PRIVATE: + { + HOST_WIDE_INT level + = TREE_INT_CST_LOW (gimple_call_arg (call, 2)); + if (level == -1) + break; + for (unsigned i = 3; + i < gimple_call_num_args (call); + i++) + { + tree arg = gimple_call_arg (call, i); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR); + tree decl = TREE_OPERAND (arg, 0); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + static char const *const axes[] = + /* Must be kept in sync with GOMP_DIM + enumeration. */ + { "gang", "worker", "vector" }; + fprintf (dump_file, "Decl UID %u has %s " + "partitioning:", DECL_UID (decl), + axes[level]); + print_generic_decl (dump_file, decl, TDF_SLIM); + fputc ('\n', dump_file); + } + if (targetm.goacc.adjust_private_decl) + { + tree oldtype = TREE_TYPE (decl); + tree newdecl + = targetm.goacc.adjust_private_decl (decl, level); + if (TREE_TYPE (newdecl) != oldtype + || newdecl != decl) + adjusted_vars.put (decl, newdecl); + } + } + remove = true; + } + break; } break; } @@ -2029,6 +2203,55 @@ execute_oacc_device_lower () gsi_next (&gsi); } + /* Make adjustments to gang-private local variables if required by the + target, e.g. forcing them into a particular address space. Afterwards, + ADDR_EXPR nodes which have adjusted variables as their argument need to + be modified in one of two ways: + + 1. They can be recreated, making a pointer to the variable in the new + address space, or + + 2. The address of the variable in the new address space can be taken, + converted to the default (original) address space, and the result of + that conversion subsituted in place of the original ADDR_EXPR node. + + Which of these is done depends on the gimple statement being processed. + At present atomic operations and inline asms use (1), and everything else + uses (2). At least on AMD GCN, there are atomic operations that work + directly in the LDS address space. + + COMPONENT_REFS, ARRAY_REFS and plain VAR_DECLs are also rewritten to use + the new decl, adjusting types of appropriate tree nodes as necessary. */ + + if (targetm.goacc.adjust_private_decl) + { + FOR_ALL_BB_FN (bb, cfun) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + walk_stmt_info wi; + var_decl_rewrite_info info; + + info.avoid_pointer_conversion + = (is_gimple_call (stmt) + && is_sync_builtin_call (as_a (stmt))) + || gimple_code (stmt) == GIMPLE_ASM; + info.stmt = stmt; + info.modified = false; + info.adjusted_vars = &adjusted_vars; + + memset (&wi, 0, sizeof (wi)); + wi.info = &info; + + walk_gimple_op (stmt, oacc_rewrite_var_decl, &wi); + + if (info.modified) + update_stmt (stmt); + } + } + free_oacc_loop (loops); return 0; diff --git a/gcc/target.def b/gcc/target.def index be7fcde961a..00b6f8f1bc9 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1712,6 +1712,36 @@ for allocating any storage for reductions when necessary.", void, (gcall *call), default_goacc_reduction) +DEFHOOK +(expand_var_decl, +"This hook, if defined, is used by accelerator target back-ends to expand\n\ +specially handled kinds of @code{VAR_DECL} expressions. A particular use is\n\ +to place variables with specific attributes inside special accelarator\n\ +memories. A return value of @code{NULL} indicates that the target does not\n\ +handle this @code{VAR_DECL}, and normal RTL expanding is resumed.\n\ +\n\ +Only define this hook if your accelerator target needs to expand certain\n\ +@code{VAR_DECL} nodes in a way that differs from the default. You can also adjust\n\ +private variables at OpenACC device-lowering time using the\n\ +@code{TARGET_GOACC_ADJUST_PRIVATE_DECL} target hook.", +rtx, (tree var), +NULL) + +DEFHOOK +(adjust_private_decl, +"This hook, if defined, is used by accelerator target back-ends to adjust\n\ +OpenACC variable declarations that should be made private to the given\n\ +parallelism level (i.e. @code{GOMP_DIM_GANG}, @code{GOMP_DIM_WORKER} or\n\ +@code{GOMP_DIM_VECTOR}). A typical use for this hook is to force variable\n\ +declarations at the @code{gang} level to reside in GPU shared memory, by\n\ +setting the address space of the decl and making it static.\n\ +\n\ +You may also use the @code{TARGET_GOACC_EXPAND_VAR_DECL} hook if the\n\ +adjusted variable declaration needs to be expanded to RTL in a non-standard\n\ +way.", +tree, (tree var, int level), +NULL) + HOOK_VECTOR_END (goacc) /* Functions relating to vectorization. */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c new file mode 100644 index 00000000000..28222c25da3 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c @@ -0,0 +1,38 @@ +#include + +int main (void) +{ + int ret; + + #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret) + { + int w = 0; + + #pragma acc loop worker + for (int i = 0; i < 32; i++) + { + #pragma acc atomic update + w++; + } + + ret = (w == 32); + } + assert (ret); + + #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret) + { + int v = 0; + + #pragma acc loop vector + for (int i = 0; i < 32; i++) + { + #pragma acc atomic update + v++; + } + + ret = (v == 32); + } + assert (ret); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c new file mode 100644 index 00000000000..a4f81a39e24 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c @@ -0,0 +1,95 @@ +#include +#include +#include +#include +#include +#include + +#if 0 +#define DEBUG(DIM, IDX, VAL) \ + fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL)) +#else +#define DEBUG(DIM, IDX, VAL) +#endif + +#define N (32*32*32) + +int +check (const char *dim, int *dist, int dimsize) +{ + int ix; + int exit = 0; + + for (ix = 0; ix < dimsize; ix++) + { + DEBUG(dim, ix, dist[ix]); + if (dist[ix] < (N) / (dimsize + 0.5) + || dist[ix] > (N) / (dimsize - 0.5)) + { + fprintf (stderr, "did not distribute to %ss (%d not between %d " + "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)), + (int) ((N) / (dimsize - 0.5))); + exit |= 1; + } + } + + return exit; +} + +int main () +{ + int ary[N]; + int ix; + int exit = 0; + int gangsize = 0, workersize = 0, vectorsize = 0; + int *gangdist, *workerdist, *vectordist; + + for (ix = 0; ix < N;ix++) + ary[ix] = -1; + +#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ + copy(ary) copyout(gangsize, workersize, vectorsize) + { +#pragma acc loop gang worker vector + for (unsigned ix = 0; ix < N; ix++) + { + int g, w, v; + + g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG); + w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER); + v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR); + + ary[ix] = (g << 16) | (w << 8) | v; + } + + gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG); + workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER); + vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR); + } + + gangdist = (int *) alloca (gangsize * sizeof (int)); + workerdist = (int *) alloca (workersize * sizeof (int)); + vectordist = (int *) alloca (vectorsize * sizeof (int)); + memset (gangdist, 0, gangsize * sizeof (int)); + memset (workerdist, 0, workersize * sizeof (int)); + memset (vectordist, 0, vectorsize * sizeof (int)); + + /* Test that work is shared approximately equally amongst each active + gang/worker/vector. */ + for (ix = 0; ix < N; ix++) + { + int g = (ary[ix] >> 16) & 255; + int w = (ary[ix] >> 8) & 255; + int v = ary[ix] & 255; + + gangdist[g]++; + workerdist[w]++; + vectordist[v]++; + } + + exit = check ("gang", gangdist, gangsize); + exit |= check ("worker", workerdist, workersize); + exit |= check ("vector", vectordist, vectorsize); + + return exit; +} diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 new file mode 100644 index 00000000000..f330f7de1be --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 @@ -0,0 +1,25 @@ +! Test for "oacc gangprivate" attribute on gang-private variables + +! { dg-do run } +! { dg-additional-options "-fdump-tree-oaccdevlow-details -w" } + +program main + integer :: w, arr(0:31) + + !$acc parallel num_gangs(32) num_workers(32) copyout(arr) + !$acc loop gang private(w) +! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning: integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */ + do j = 0, 31 + w = 0 + !$acc loop seq + do i = 0, 31 + !$acc atomic update + w = w + 1 + !$acc end atomic + end do + arr(j) = w + end do + !$acc end parallel + + if (any (arr .ne. 32)) stop 1 +end program main diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 new file mode 100644 index 00000000000..f4e67b0c708 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 @@ -0,0 +1,25 @@ +! Test for worker-private variables + +! { dg-do run } +! { dg-additional-options "-fdump-tree-oaccdevlow-details" } + +program main + integer :: w, arr(0:31) + + !$acc parallel num_gangs(32) num_workers(32) copyout(arr) + !$acc loop gang worker private(w) +! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning: integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */ + do j = 0, 31 + w = 0 + !$acc loop seq + do i = 0, 31 + !$acc atomic update + w = w + 1 + !$acc end atomic + end do + arr(j) = w + end do + !$acc end parallel + + if (any (arr .ne. 32)) stop 1 +end program main From patchwork Fri Feb 26 12:34:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1444917 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Dn8J229wNz9sCD for ; Fri, 26 Feb 2021 23:35:14 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4123B39730EA; Fri, 26 Feb 2021 12:35:10 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 0C1CE3842424 for ; Fri, 26 Feb 2021 12:35:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0C1CE3842424 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Julian_Brown@mentor.com IronPort-SDR: 7X9MAgYnKCldpXv45NwAEcE10eAY6/LM2YavkeXS8bw6xAc9RGVdMekdPhLwiYawGycnjzA4u+ pA3e1TVdzHl84soNpNvg12NqDUaLAjZr3IzcuisKuD63k7yvk2MHVfZD3jb1Dvoe7a49cAr8iP njzhUw1ihinHJh08JdWCBrP7cZTHYqtl0r9uZ2I8umMLI4oZKjQZDzmpIcS6R+hRY4bAZRwdfL F0lvtYakfUj6+uqdGWEXwwYkUy64begSoHZNHpLcM8BobnaN1IKUuMoncPp7ZAu0NusQey7dVS on0= X-IronPort-AV: E=Sophos;i="5.81,208,1610438400"; d="scan'208";a="58524565" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 26 Feb 2021 04:35:07 -0800 IronPort-SDR: vWjQmC7WzwX8KlejwU3kzv8arHgQWDRtPH44DL9mDlRRRiyJLuMJ3bMKxvBJTdugUwTveSnb0u qaT7M+3ogf9HdNJzvANQq2D+DtRMNoKE+vsrD8qqs9W+6gI0sQiq66snF4rn4pAf3mBAWjn0pT WeOA2lbACoBzC9/X8SA+F4XMrv5lr2thQxOwBqIFUfhlLm32/Dp/I48LS7NWxxlZSgQSofCKwK srt1XXuGGny6atdSbNwpFWWFqrJT54PiIY1plpBrNKr98u2p8B8wCFlPDZwjB/fUDURcirkMmE F+E= From: Julian Brown To: Subject: [PATCH 2/3] amdgcn: AMD GCN parts for OpenACC private variables patch Date: Fri, 26 Feb 2021 04:34:51 -0800 Message-ID: X-Mailer: git-send-email 2.29.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, Thomas Schwinge Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch updates the TARGET_GOACC_ADJUST_PRIVATE_DECL target hook in the AMD GCN backend to the current name and prototype. (An earlier version of the hook was already present, but dormant.) (I can self-approve this. I will commit as/when the previous patch is approved.) Thanks, Julian gcc/ * config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. * config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter. * config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename definition using gcn_goacc_adjust_gangprivate_decl... (TARGET_GOACC_ADJUST_PRIVATE_DECL): ...to this, using gcn_goacc_adjust_private_decl. --- gcc/config/gcn/gcn-protos.h | 2 +- gcc/config/gcn/gcn-tree.c | 9 +++++++-- gcc/config/gcn/gcn.c | 4 ++-- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h index dc9331c445d..7ef7ae8af46 100644 --- a/gcc/config/gcn/gcn-protos.h +++ b/gcc/config/gcn/gcn-protos.h @@ -40,7 +40,7 @@ extern rtx gcn_gen_undef (machine_mode); extern bool gcn_global_address_p (rtx); extern tree gcn_goacc_adjust_propagation_record (tree record_type, bool sender, const char *name); -extern void gcn_goacc_adjust_gangprivate_decl (tree var); +extern tree gcn_goacc_adjust_private_decl (tree var, int level); extern void gcn_goacc_reduction (gcall *call); extern bool gcn_hard_regno_rename_ok (unsigned int from_reg, unsigned int to_reg); diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c index 8f270991c86..75ea50c59dd 100644 --- a/gcc/config/gcn/gcn-tree.c +++ b/gcc/config/gcn/gcn-tree.c @@ -577,9 +577,12 @@ gcn_goacc_adjust_propagation_record (tree record_type, bool sender, return decl; } -void -gcn_goacc_adjust_gangprivate_decl (tree var) +tree +gcn_goacc_adjust_private_decl (tree var, int level) { + if (level != GOMP_DIM_GANG) + return var; + tree type = TREE_TYPE (var); tree lds_type = build_qualified_type (type, TYPE_QUALS_NO_ADDR_SPACE (type) @@ -597,6 +600,8 @@ gcn_goacc_adjust_gangprivate_decl (tree var) if (machfun) machfun->use_flat_addressing = true; + + return var; } /* }}} */ diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index e8bb0b63756..1ea919bf058 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -6317,8 +6317,8 @@ gcn_dwarf_register_span (rtx rtl) #undef TARGET_GOACC_ADJUST_PROPAGATION_RECORD #define TARGET_GOACC_ADJUST_PROPAGATION_RECORD \ gcn_goacc_adjust_propagation_record -#undef TARGET_GOACC_ADJUST_GANGPRIVATE_DECL -#define TARGET_GOACC_ADJUST_GANGPRIVATE_DECL gcn_goacc_adjust_gangprivate_decl +#undef TARGET_GOACC_ADJUST_PRIVATE_DECL +#define TARGET_GOACC_ADJUST_PRIVATE_DECL gcn_goacc_adjust_private_decl #undef TARGET_GOACC_FORK_JOIN #define TARGET_GOACC_FORK_JOIN gcn_fork_join #undef TARGET_GOACC_REDUCTION From patchwork Fri Feb 26 12:34:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1444919 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Dn8JB3YDcz9sCD for ; Fri, 26 Feb 2021 23:35:22 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9C8433950434; Fri, 26 Feb 2021 12:35:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 3F9D33842427 for ; Fri, 26 Feb 2021 12:35:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3F9D33842427 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Julian_Brown@mentor.com IronPort-SDR: rGQajK0p5G3tW4haNKTuyJLigJKCy4mxPiBZm/CTmg2uToAsWyCOXau8/9e+1fTyiL2udbV0Ds oMPB3vyIGdeqWRHep+EKnkr7dF+J9jjYv/+o+U4xUEMUiQyaGyGLs4QIYT5HTcXWssLx6vBoPa mLdnqlWyuFMpch6A3eDF2JBIUKVsxJJkXco47MNoWXgLiEpSFd4tGIB6jgg4OUuAbSF5cZ3Oxg S8ztsOdWL548J68eq+34m7Clx6aPjb8SZuL1zzSpA8sTjcoDv5rYwVInabrCz65krjBWH9Tylw 5h8= X-IronPort-AV: E=Sophos;i="5.81,208,1610438400"; d="scan'208";a="58524567" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 26 Feb 2021 04:35:11 -0800 IronPort-SDR: pM7YfqZdJXA6PrzlwDvk3IZM496HKrY0kDwuq9EFRTQNVmQbR9KjLg+jvmOjbLB+wVvD1Ax4Du OHPrfVWNhvafEarZBOp6Z0AsQQDsqqMj/tPMveXMdJgvpnaVZCLAeWn2aJ+Uzg66Ifhl8vJBA1 56U43w4dKxL+bEnh4nm7wXdE5w5JuHfZnNLKzB8KoHZirkmPVJMx016euBl5NJnmuDQs1MhexJ qJZKSj3QPErDdfflaNZyYhHTWoPJsIcsmDi5Xqb2S2EiauU6FiKVvws+XI2cBGEMVntjVULWVB CSI= From: Julian Brown To: Subject: [PATCH 3/3] nvptx: NVPTX parts for OpenACC private variables patch Date: Fri, 26 Feb 2021 04:34:52 -0800 Message-ID: X-Mailer: git-send-email 2.29.2 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, Thomas Schwinge Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch contains the NVPTX backend support for placing OpenACC gang-private variables in GPU shared memory. Tested with offloading to NVPTX. This is substantially the same as the version previously posted: I will assume it is already approved (unless I hear objections), and will commit it at the same time as the rest of the series. (https://gcc.gnu.org/pipermail/gcc-patches/2018-October/507909.html) Thanks, Julian 2021-02-23 Chung-Lin Tang Julian Brown gcc/ * config/nvptx/nvptx.c (tree-pretty-print.h): Include. (gangprivate_shared_size): New global variable. (gangprivate_shared_align): Likewise. (gangprivate_shared_sym): Likewise. (gangprivate_shared_hmap): Likewise. (nvptx_option_override): Initialize gangprivate_shared_sym, gangprivate_shared_align. (nvptx_file_end): Output gangprivate_shared_sym. (nvptx_goacc_adjust_private_decl, nvptx_goacc_expand_accel_var): New functions. (nvptx_set_current_function): Clear gangprivate_shared_hmap. (TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook. (TARGET_GOACC_EXPAND_VAR_DECL): Likewise. --- gcc/config/nvptx/nvptx.c | 78 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 794c5a69db0..a0474b0077b 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -75,6 +75,7 @@ #include "fold-const.h" #include "intl.h" #include "opts.h" +#include "tree-pretty-print.h" /* This file should be included last. */ #include "target-def.h" @@ -167,6 +168,12 @@ static unsigned vector_red_align; static unsigned vector_red_partition; static GTY(()) rtx vector_red_sym; +/* Shared memory block for gang-private variables. */ +static unsigned gangprivate_shared_size; +static unsigned gangprivate_shared_align; +static GTY(()) rtx gangprivate_shared_sym; +static hash_map gangprivate_shared_hmap; + /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; @@ -251,6 +258,10 @@ nvptx_option_override (void) vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; vector_red_partition = 0; + gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared"); + SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED); + gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + diagnose_openacc_conflict (TARGET_GOMP, "-mgomp"); diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack"); diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt"); @@ -5355,6 +5366,10 @@ nvptx_file_end (void) write_shared_buffer (asm_out_file, vector_red_sym, vector_red_align, vector_red_size); + if (gangprivate_shared_size) + write_shared_buffer (asm_out_file, gangprivate_shared_sym, + gangprivate_shared_align, gangprivate_shared_size); + if (need_softstack_decl) { write_var_marker (asm_out_file, false, true, "__nvptx_stacks"); @@ -6582,6 +6597,62 @@ nvptx_truly_noop_truncation (poly_uint64, poly_uint64) return false; } +/* Implement TARGET_GOACC_ADJUST_PRIVATE_DECL. Set "oacc gangprivate" + attribute for gang-private variable declarations. */ + +static tree +nvptx_goacc_adjust_private_decl (tree decl, int level) +{ + if (level != GOMP_DIM_GANG) + return decl; + + if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Setting 'oacc gangprivate' attribute for decl:"); + print_generic_decl (dump_file, decl, TDF_SLIM); + fputc ('\n', dump_file); + } + tree id = get_identifier ("oacc gangprivate"); + DECL_ATTRIBUTES (decl) = tree_cons (id, NULL, DECL_ATTRIBUTES (decl)); + } + + return decl; +} + +/* Implement TARGET_GOACC_EXPAND_VAR_DECL. Place "oacc gangprivate" + variables in shared memory. */ + +static rtx +nvptx_goacc_expand_var_decl (tree var) +{ + if (VAR_P (var) + && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var))) + { + unsigned int offset, *poffset; + poffset = gangprivate_shared_hmap.get (var); + if (poffset) + offset = *poffset; + else + { + unsigned HOST_WIDE_INT align = DECL_ALIGN (var); + gangprivate_shared_size + = (gangprivate_shared_size + align - 1) & ~(align - 1); + if (gangprivate_shared_align < align) + gangprivate_shared_align = align; + + offset = gangprivate_shared_size; + bool existed = gangprivate_shared_hmap.put (var, offset); + gcc_assert (!existed); + gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var)); + } + rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset); + return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr); + } + return NULL_RTX; +} + static GTY(()) tree nvptx_previous_fndecl; static void @@ -6590,6 +6661,7 @@ nvptx_set_current_function (tree fndecl) if (!fndecl || fndecl == nvptx_previous_fndecl) return; + gangprivate_shared_hmap.empty (); nvptx_previous_fndecl = fndecl; vector_red_partition = 0; oacc_bcast_partition = 0; @@ -6754,6 +6826,12 @@ nvptx_libc_has_function (enum function_class fn_class, tree type) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_GOACC_ADJUST_PRIVATE_DECL +#define TARGET_GOACC_ADJUST_PRIVATE_DECL nvptx_goacc_adjust_private_decl + +#undef TARGET_GOACC_EXPAND_VAR_DECL +#define TARGET_GOACC_EXPAND_VAR_DECL nvptx_goacc_expand_var_decl + #undef TARGET_SET_CURRENT_FUNCTION #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function