From patchwork Thu Sep 5 01:45:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1158170 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508338-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Q34NBZIZ"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46P3Sl42QGz9s7T for ; Thu, 5 Sep 2019 11:46:31 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=CUiQg/9PLZeRDeyqP5jQDho90+GtJCukDB4rEbITvHffoxHD9fN4E 1Bb4fsG7fA973j1k1M3sJzAg/Sr+qj7365pEen6bsLEC0TPW1wHCTV0FE6pax7VP IrpHmhIYeU2+ASHcGXfxOg0UdrLmbIL9o7W14z2LeXn/MnA15Tcf38= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=ZfycdQroMnGySCRO71yn3N7ViK4=; b=Q34NBZIZI0g6IEfIQABBuHaOsZ+k w6j5MB7qH2I6N9uX5wN6647I760ySwlm9VNq6fs4XfBr2xRiqWhcdu8mLQQypsNK hLxOQAFyp0896ddKSVzTznRVqTjqZwQXsA1k18Ls+wT58TkUFwL8dFui+mgL2di2 oqrlzJyvL7nqq80= Received: (qmail 1899 invoked by alias); 5 Sep 2019 01:46:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 1784 invoked by uid 89); 5 Sep 2019 01:46:13 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=asms X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 01:46:11 +0000 IronPort-SDR: A+Qllg4IpYm9n/zw8xvBW46c6NxkEHH8uSQg4SdVrz/ksu8iOFaDl+9fgGlFZLvV6Q8WHRymYN vEG8KT5KXQ6Oxgv42p4z8LXn9po8fRGdl5pCi7ftEmdVpkv7EyvPXLWH1lEIBSNCtafP/hZdf2 3egy/eCLL1tKSFM3bAwSaQnoNkWNqZKMVli0RUDHA+KTpgNmTlI9+XfxUBcjHURL5G8L8+KBrb SflCNLhFBHPZHu1iVtRYBRXoAk1cGPqcTNANxyNevP4Fqh2JOflOvNSD/vUF27NErovcryuEmQ dEM= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 04 Sep 2019 17:46:09 -0800 IronPort-SDR: 5n6I0CLM2N5PcVRLkwzkkukG5y0+Xr3K3AEL+2IrLhCNNaJchJFB2AtiuoLsT1gmFWLVtds0Uw NiAP9ZpGvN6N4VJ4atvPEZTL/IUC1ZSBGnklYnkyZ3w3hzNjjK0xfCuiVF/ZCZe122uBOI0DOk 6yZDNCTu/7FZPnW+OIy/M7EgRkH0iQXe4pX2ILil9+Fy5p+xwiWhO4sCcZoLCiX0ftL8PoUD0n RMVFknzTTD6I+atwKO7hTRhTYq5nw1YBCbvaUCzWtOOgfBQcmWriuYeEm4hTza0xzCN+UoKgOy DCg= From: Julian Brown To: CC: Andrew Stubbs Subject: [PATCH 1/6] [og9] Target-dependent gang-private variable decl rewriting Date: Wed, 4 Sep 2019 18:45:50 -0700 Message-ID: <2c432092fae99930879687f88f2e8e97d29c786d.1567644180.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch adds support for rewriting variables marked up with the "oacc gangprivate" attributes in a target-dependent way in the oaccdevlow pass of the offload compiler. This behaviour is controlled by a new target hook, TARGET_GOACC_ADJUST_GANGPRIVATE_DECL. This is conceptually similar to the existing TARGET_GOACC_EXPAND_ACCEL_VAR hook, but that one works too late in the compilation process for AMD GCN. The patch to set the "oacc gangprivate" attribute was posted upstream here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00749.html A version of that is already present on the og9 branch. Julian ChangeLog gcc/ * omp-offload.c (convert.h): Include. (struct addr_expr_rewrite_info): Add struct. (rewrite_addr_expr): New function. (is_sync_builtin_call): New function. (execute_oacc_device_lower): Support rewriting gang-private variables using target hook, and fix up addr_expr nodes afterwards. * target.def (adjust_gangprivate_decl): New target hook. * doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Document new target hook. * doc/tm.texi: Regenerate. --- gcc/ChangeLog.openacc | 13 +++++ gcc/doc/tm.texi | 4 ++ gcc/doc/tm.texi.in | 2 + gcc/omp-offload.c | 133 ++++++++++++++++++++++++++++++++++++++++++ gcc/target.def | 6 ++ 5 files changed, 158 insertions(+) diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index a22f07c817c..b1c627b394c 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,16 @@ +2019-09-05 Julian Brown + + * omp-offload.c (convert.h): Include. + (struct addr_expr_rewrite_info): Add struct. + (rewrite_addr_expr): New function. + (is_sync_builtin_call): New function. + (execute_oacc_device_lower): Support rewriting gang-private variables + using target hook, and fix up addr_expr nodes afterwards. + * target.def (adjust_gangprivate_decl): New target hook. + * doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Document new + target hook. + * doc/tm.texi: Regenerate. + 2019-08-13 Julian Brown * omp-oacc-kernels.c (add_wait): New function, split out of... diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 9b88498eb95..f3707c6abe3 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6162,6 +6162,10 @@ memories. A return value of NULL indicates that the target does not handle this VAR_DECL, and normal RTL expanding is resumed. @end deftypefn +@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_GANGPRIVATE_DECL (tree @var{var}) +Tweak variable declaration for a gang-private variable. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_GOACC_EXPLODE_ARGS (void) Define this hook to TRUE if arguments to offload regions should be exploded, i.e. passed as true arguments rather than in an argument array. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index c9c4341a35f..cebadf4a502 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4210,6 +4210,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_GOACC_EXPAND_ACCEL_VAR +@hook TARGET_GOACC_ADJUST_GANGPRIVATE_DECL + @hook TARGET_GOACC_EXPLODE_ARGS @node Anchored Addresses diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 1129b00511e..c94dc956d7e 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include "stringpool.h" #include "attribs.h" #include "cfgloop.h" +#include "convert.h" /* Describe the OpenACC looping structure of a function. The entire function is held in a 'NULL' loop. */ @@ -1570,6 +1571,78 @@ maybe_discard_oacc_function (tree decl) return false; } +struct addr_expr_rewrite_info +{ + gimple *stmt; + hash_set *adjusted_vars; + bool avoid_pointer_conversion; + bool modified; +}; + +static tree +rewrite_addr_expr (tree *tp, int *walk_subtrees, void *data) +{ + walk_stmt_info *wi = (walk_stmt_info *) data; + addr_expr_rewrite_info *info = (addr_expr_rewrite_info *) wi->info; + + if (TREE_CODE (*tp) == ADDR_EXPR) + { + tree arg = TREE_OPERAND (*tp, 0); + + if (info->adjusted_vars->contains (arg)) + { + if (info->avoid_pointer_conversion) + { + *tp = build_fold_addr_expr (arg); + info->modified = true; + *walk_subtrees = 0; + } + else + { + gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt); + tree repl = build_fold_addr_expr (arg); + gimple *stmt1 + = gimple_build_assign (make_ssa_name (TREE_TYPE (repl)), repl); + tree conv = convert_to_pointer (TREE_TYPE (*tp), + gimple_assign_lhs (stmt1)); + gimple *stmt2 + = gimple_build_assign (make_ssa_name (TREE_TYPE (*tp)), conv); + gsi_insert_before (&gsi, stmt1, GSI_SAME_STMT); + gsi_insert_before (&gsi, stmt2, GSI_SAME_STMT); + *tp = gimple_assign_lhs (stmt2); + info->modified = true; + *walk_subtrees = 0; + } + } + } + + return NULL_TREE; +} + +/* Return TRUE if CALL is a call to a builtin atomic/sync operation. */ + +static bool +is_sync_builtin_call (gcall *call) +{ + tree callee = gimple_call_fndecl (call); + + if (callee != NULL_TREE + && gimple_call_builtin_p (call, BUILT_IN_NORMAL)) + switch (DECL_FUNCTION_CODE (callee)) + { +#undef DEF_SYNC_BUILTIN +#define DEF_SYNC_BUILTIN(ENUM, NAME, TYPE, ATTRS) case ENUM: +#include "sync-builtins.def" +#undef DEF_SYNC_BUILTIN + return true; + + default: + ; + } + + return false; +} + /* Main entry point for oacc transformations which run on the device compiler after LTO, so we know what the target device is at this point (including the host fallback). */ @@ -1815,6 +1888,66 @@ execute_oacc_device_lower () gsi_next (&gsi); } + /* Make adjustments to gang-private local variables if required by the + target, e.g. forcing them into a particular address space. Afterwards, + ADDR_EXPR nodes which have adjusted variables as their argument need to + be modified in one of two ways: + + 1. They can be recreated, making a pointer to the variable in the new + address space, or + + 2. The address of the variable in the new address space can be taken, + converted to the default (original) address space, and the result of + that conversion subsituted in place of the original ADDR_EXPR node. + + Which of these is done depends on the gimple statement being processed. + At present atomic operations and inline asms use (1), and everything else + uses (2). At least on AMD GCN, there are atomic operations that work + directly in the LDS address space. */ + + if (targetm.goacc.adjust_gangprivate_decl) + { + tree var; + unsigned i; + hash_set adjusted_vars; + + FOR_EACH_LOCAL_DECL (cfun, i, var) + { + if (!VAR_P (var) + || !lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var))) + continue; + + targetm.goacc.adjust_gangprivate_decl (var); + adjusted_vars.add (var); + } + + FOR_ALL_BB_FN (bb, cfun) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + walk_stmt_info wi; + addr_expr_rewrite_info info; + + info.avoid_pointer_conversion + = (is_gimple_call (stmt) + && is_sync_builtin_call (as_a (stmt))) + || gimple_code (stmt) == GIMPLE_ASM; + info.stmt = stmt; + info.modified = false; + info.adjusted_vars = &adjusted_vars; + + memset (&wi, 0, sizeof (wi)); + wi.info = &info; + + walk_gimple_op (stmt, rewrite_addr_expr, &wi); + + if (info.modified) + update_stmt (stmt); + } + } + free_oacc_loop (loops); return 0; diff --git a/gcc/target.def b/gcc/target.def index d26b888a485..d82db232e40 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1729,6 +1729,12 @@ handle this VAR_DECL, and normal RTL expanding is resumed.", rtx, (tree var), NULL) +DEFHOOK +(adjust_gangprivate_decl, +"Tweak variable declaration for a gang-private variable.", +void, (tree var), +NULL) + DEFHOOK (explode_args, "Define this hook to TRUE if arguments to offload regions should be\n\ From patchwork Thu Sep 5 01:45:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1158173 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508341-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="qYjMpWxG"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46P3TV5cX0z9s4Y for ; Thu, 5 Sep 2019 11:47:10 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=po6LMlk8Hr1XsxFF5ZTo+AJr3MSSa/7ad+fxGRAN9rQW4h2+5/td8 Dlsnd/S/N4bn57+oIsL4z3OMFTMFRIh/kEISCNqFAR34YOah/P04Pf7a6l6bDnaG T4dtftUmeV4d3zW3WHZ/10ck8xctwuT4othwwdL36yEd9oKMcCT87A= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=kee0YSavKLFDpi49yJlFS4vdx+E=; b=qYjMpWxGcnlLKmZa5UWxH1i+Y1aT SNm1lyqhQ2S4wkdnPAbCwD3islz6YLGNUsGJmmWkkTJsSjRSBGvcCcvSahT3XXU1 aeAoTGTWRs3H8OWQKrRd0LLw77cXanRCg2EBk9JZ3Pze6s4AHNVp/a0PiWySxq5W LE1o4PqBmDIlw98= Received: (qmail 3832 invoked by alias); 5 Sep 2019 01:46:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3728 invoked by uid 89); 5 Sep 2019 01:46:28 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=transmission, gang, speeds X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 01:46:15 +0000 IronPort-SDR: JVy5rFUFd5Z9ATyxgqa3jM8big7D6BxW/BHdIXm7/eRzxlBdP4D9pjJhhop9OMHa1PpY9Idtu0 d0d4e+B/NVU5NIinxvAVx5wFb10ZE2NUiadzvtJA/gseQyHymJa8nKIneJ21P3CKhvxCIn6twS 9qhX7i86d3TvOhgBlx0n0+ATlGQHIS35N4A+J3rYfEUceW4+z4W1FSwWR9cB4ON9ruLsJPE+11 NESxbbKUut1/W0TPkBk365CFgYIzRJC8ka7JHwdRefjaKimhifbStznPTks7309dczQCTHKQpO IR4= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 04 Sep 2019 17:46:13 -0800 IronPort-SDR: zJlfF95S1Q9hSjLg0uRBUXWOrTmbZdsh2O5GMoQuD8C080phL5rFe3pl590xjGutYRFkAz4pyC HTbjTJ29QkD5blF/C2dn92K6j0xYnKZPBMLF/bvf+aIEdL61hz5beDPBA84ppHEOkUkMmBJxHO e051P+KrFe5fJveSep7YeHpPPSjV4aH6zgQ0YqzeLHpHfZ9ItFughprnrtdw0QH7ZoTnVbuydR HP4P4rcVuZ+XYMtJyh/KGmw5TO0+qnvXn0ypc/uOJ1tqfHXOQrC+8m2/IY8rm19+U4Ioqrs1bH RvA= From: Julian Brown To: CC: Andrew Stubbs Subject: [PATCH 2/6] [og9] OpenACC middle-end worker-partitioning support Date: Wed, 4 Sep 2019 18:45:51 -0700 Message-ID: <1de0113e1a6807da85e5c7b0f7d473234f78dd45.1567644180.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch implements worker-partitioning support in the middle end, by rewriting gimple. The OpenACC execution model requires that code can run in either "worker single" mode where only a single worker per gang is active, or "worker partitioned" mode, where multiple workers per gang are active. This means we need to do something equivalent to spawning additional workers when transitioning from worker-single to worker-partitioned mode. However, GPUs typically fix the number of threads of invoked kernels at launch time, so we need to do something with the "extra" threads when they are not wanted. The scheme used is -- very briefly! -- to conditionalise each basic block that executes in "worker single" mode for worker 0 only. Conditional branches are handled specially so "idle" (non-0) workers follow along with worker 0. On transitioning to "worker partitioned" mode, any variables modified by worker 0 are propagated to the other workers via GPU shared memory. Special care is taken for routine calls, writes through pointers, and so forth. Much of omp-sese.c originates from code written for NVPTX by Nathan Sidwell (adapted to work on gimple instead of RTL) -- though at present, only the per-basic-block scheme is implemented, and the SESE-finding algorithm isn't yet used. Julian ChangeLog gcc/ * Makefile.in (OBJS): Add omp-sese.o. * omp-builtins.def (BUILT_IN_GOACC_BARRIER, BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END): New builtins. * omp-offload.c (omp-sese.h): Include header. (oacc_loop_xform_head_tail): Call update_stmt for modified builtin calls. (oacc_loop_process): Likewise. (default_goacc_create_propagation_record): New default implementation for TARGET_GOACC_CREATE_PROPAGATION_RECORD hook. (execute_oacc_loop_designation): New. Split out of oacc_device_lower. (execute_oacc_gimple_workers): New. Likewise. (execute_oacc_device_lower): Recreate dims array. (pass_data_oacc_loop_designation, pass_data_oacc_gimple_workers): New. (pass_oacc_loop_designation, pass_oacc_gimple_workers): New. (make_pass_oacc_loop_designation, make_pass_oacc_gimple_workers): New. * omp-offload.h (oacc_fn_attrib_level): Add prototype. * omp-sese.c: New file. * omp-sese.h: New file. * passes.def (pass_oacc_loop_designation, pass_oacc_gimple_workers): Add passes. * target.def (worker_partitioning, create_propagation_record): Add target hooks. * targhooks.h (default_goacc_create_propagation_record): Add prototype. * tree-pass.h (make_pass_oacc_loop_designation, make_pass_oacc_gimple_workers): Add prototypes. * doc/tm.texi.in (TARGET_GOACC_WORKER_PARTITIONING, TARGET_GOACC_CREATE_PROPAGATION_RECORD): Add documentation hooks. * doc/tm.texi: Regenerate. --- gcc/ChangeLog.openacc | 32 + gcc/Makefile.in | 1 + gcc/doc/tm.texi | 10 + gcc/doc/tm.texi.in | 4 + gcc/omp-builtins.def | 8 + gcc/omp-offload.c | 159 +++- gcc/omp-offload.h | 1 + gcc/omp-sese.c | 2036 +++++++++++++++++++++++++++++++++++++++++ gcc/omp-sese.h | 26 + gcc/passes.def | 2 + gcc/target.def | 13 + gcc/targhooks.h | 1 + gcc/tree-pass.h | 2 + 13 files changed, 2276 insertions(+), 19 deletions(-) create mode 100644 gcc/omp-sese.c create mode 100644 gcc/omp-sese.h diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index b1c627b394c..a2b2dcfcf26 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,35 @@ +2019-09-05 Julian Brown + + * Makefile.in (OBJS): Add omp-sese.o. + * omp-builtins.def (BUILT_IN_GOACC_BARRIER, BUILT_IN_GOACC_SINGLE_START, + BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END): New + builtins. + * omp-offload.c (omp-sese.h): Include header. + (oacc_loop_xform_head_tail): Call update_stmt for modified builtin + calls. + (oacc_loop_process): Likewise. + (default_goacc_create_propagation_record): New default implementation + for TARGET_GOACC_CREATE_PROPAGATION_RECORD hook. + (execute_oacc_loop_designation): New. Split out of oacc_device_lower. + (execute_oacc_gimple_workers): New. Likewise. + (execute_oacc_device_lower): Recreate dims array. + (pass_data_oacc_loop_designation, pass_data_oacc_gimple_workers): New. + (pass_oacc_loop_designation, pass_oacc_gimple_workers): New. + (make_pass_oacc_loop_designation, make_pass_oacc_gimple_workers): New. + * omp-offload.h (oacc_fn_attrib_level): Add prototype. + * omp-sese.c: New file. + * omp-sese.h: New file. + * passes.def (pass_oacc_loop_designation, pass_oacc_gimple_workers): + Add passes. + * target.def (worker_partitioning, create_propagation_record): Add + target hooks. + * targhooks.h (default_goacc_create_propagation_record): Add prototype. + * tree-pass.h (make_pass_oacc_loop_designation, + make_pass_oacc_gimple_workers): Add prototypes. + * doc/tm.texi.in (TARGET_GOACC_WORKER_PARTITIONING, + TARGET_GOACC_CREATE_PROPAGATION_RECORD): Add documentation hooks. + * doc/tm.texi: Regenerate. + 2019-09-05 Julian Brown * omp-offload.c (convert.h): Include. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 9e93eaaa173..5159cd6a30c 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1429,6 +1429,7 @@ OBJS = \ omp-expand.o \ omp-general.o \ omp-grid.o \ + omp-sese.o \ omp-low.o \ omp-oacc-kernels.o \ omp-simd-clone.o \ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f3707c6abe3..536a436b1c4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6166,6 +6166,16 @@ handle this VAR_DECL, and normal RTL expanding is resumed. Tweak variable declaration for a gang-private variable. @end deftypefn +@deftypevr {Target Hook} bool TARGET_GOACC_WORKER_PARTITIONING +Use gimple transformation for worker neutering/broadcasting. +@end deftypevr + +@deftypefn {Target Hook} tree TARGET_GOACC_CREATE_PROPAGATION_RECORD (tree @var{rec}, bool @var{sender}, const char *@var{name}) +Create a record used to propagate local-variable state from an active +worker to other workers. A possible implementation might adjust the type +of REC to place the new variable in shared GPU memory. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_GOACC_EXPLODE_ARGS (void) Define this hook to TRUE if arguments to offload regions should be exploded, i.e. passed as true arguments rather than in an argument array. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index cebadf4a502..c0b92f25da7 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4212,6 +4212,10 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_GOACC_ADJUST_GANGPRIVATE_DECL +@hook TARGET_GOACC_WORKER_PARTITIONING + +@hook TARGET_GOACC_CREATE_PROPAGATION_RECORD + @hook TARGET_GOACC_EXPLODE_ARGS @node Anchored Addresses diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index 9961c287494..a8f10e3389e 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -73,6 +73,8 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_BARRIER, "GOMP_barrier", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_BARRIER_CANCEL, "GOMP_barrier_cancel", BT_FN_BOOL, ATTR_NOTHROW_LEAF_LIST) +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_BARRIER, "GOACC_barrier", + BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASKWAIT, "GOMP_taskwait", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASKWAIT_DEPEND, "GOMP_taskwait_depend", @@ -410,6 +412,12 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, "GOMP_single_copy_start", BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end", BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST) +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_SINGLE_START, "GOACC_single_start", + BT_FN_BOOL, ATTR_NOTHROW_LEAF_LIST) +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_SINGLE_COPY_START, "GOACC_single_copy_start", + BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST) +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_SINGLE_COPY_END, "GOACC_single_copy_end", + BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_REGISTER, "GOMP_offload_register_ver", BT_FN_VOID_UINT_PTR_INT_PTR, ATTR_NOTHROW_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_UNREGISTER, diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index c94dc956d7e..a6f64aac37e 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include "stringpool.h" #include "attribs.h" #include "cfgloop.h" +#include "omp-sese.h" #include "convert.h" /* Describe the OpenACC looping structure of a function. The entire @@ -1117,6 +1118,8 @@ oacc_loop_xform_head_tail (gcall *from, int level) else if (gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION)) *gimple_call_arg_ptr (stmt, 3) = replacement; + update_stmt (stmt); + gsi_next (&gsi); while (gsi_end_p (gsi)) gsi = gsi_start_bb (single_succ (gsi_bb (gsi))); @@ -1141,25 +1144,28 @@ oacc_loop_process (oacc_loop *loop) gcall *call; for (ix = 0; loop->ifns.iterate (ix, &call); ix++) - switch (gimple_call_internal_fn (call)) - { - case IFN_GOACC_LOOP: + { + switch (gimple_call_internal_fn (call)) { - bool is_e = gimple_call_arg (call, 5) == integer_minus_one_node; - gimple_call_set_arg (call, 5, is_e ? e_mask_arg : mask_arg); - if (!is_e) - gimple_call_set_arg (call, 4, chunk_arg); - } - break; + case IFN_GOACC_LOOP: + { + bool is_e = gimple_call_arg (call, 5) == integer_minus_one_node; + gimple_call_set_arg (call, 5, is_e ? e_mask_arg : mask_arg); + if (!is_e) + gimple_call_set_arg (call, 4, chunk_arg); + } + break; - case IFN_GOACC_TILE: - gimple_call_set_arg (call, 3, mask_arg); - gimple_call_set_arg (call, 4, e_mask_arg); - break; + case IFN_GOACC_TILE: + gimple_call_set_arg (call, 3, mask_arg); + gimple_call_set_arg (call, 4, e_mask_arg); + break; - default: - gcc_unreachable (); - } + default: + gcc_unreachable (); + } + update_stmt (call); + } unsigned dim = GOMP_DIM_GANG; unsigned mask = loop->mask | loop->e_mask; @@ -1643,12 +1649,27 @@ is_sync_builtin_call (gcall *call) return false; } +/* Default implementation creates a temporary variable of type RECORD_TYPE if + SENDER is true, else a pointer to RECORD_TYPE if SENDER is false. */ + +tree +default_goacc_create_propagation_record (tree record_type, bool sender, + const char *name) +{ + tree type = record_type; + + if (!sender) + type = build_pointer_type (type); + + return create_tmp_var (type, name); +} + /* Main entry point for oacc transformations which run on the device compiler after LTO, so we know what the target device is at this point (including the host fallback). */ static unsigned int -execute_oacc_device_lower () +execute_oacc_loop_designation () { tree attr = oacc_get_fn_attrib (current_function_decl); if (!attr) @@ -1777,10 +1798,36 @@ execute_oacc_device_lower () free_oacc_loop (l); } + free_oacc_loop (loops); + /* Offloaded targets may introduce new basic blocks, which require dominance information to update SSA. */ calculate_dominance_info (CDI_DOMINATORS); + return 0; +} + +int +execute_oacc_gimple_workers (void) +{ + oacc_do_neutering (); + calculate_dominance_info (CDI_DOMINATORS); + return 0; +} + +static unsigned int +execute_oacc_device_lower () +{ + int dims[GOMP_DIM_MAX]; + tree attr = oacc_get_fn_attrib (current_function_decl); + + if (!attr) + /* Not an offloaded function. */ + return 0; + + for (unsigned i = 0; i < GOMP_DIM_MAX; i++) + dims[i] = oacc_get_fn_dim_size (current_function_decl, i); + /* Now lower internal loop functions to target-specific code sequences. */ basic_block bb; @@ -1948,8 +1995,6 @@ execute_oacc_device_lower () } } - free_oacc_loop (loops); - return 0; } @@ -1990,6 +2035,70 @@ default_goacc_dim_limit (int ARG_UNUSED (axis)) namespace { +const pass_data pass_data_oacc_loop_designation = +{ + GIMPLE_PASS, /* type */ + "oaccloops", /* name */ + OPTGROUP_OMP, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_cfg, /* properties_required */ + 0 /* Possibly PROP_gimple_eomp. */, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa | TODO_cleanup_cfg + | TODO_rebuild_alias, /* todo_flags_finish */ +}; + +class pass_oacc_loop_designation : public gimple_opt_pass +{ +public: + pass_oacc_loop_designation (gcc::context *ctxt) + : gimple_opt_pass (pass_data_oacc_loop_designation, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return flag_openacc; }; + + virtual unsigned int execute (function *) + { + return execute_oacc_loop_designation (); + } + +}; // class pass_oacc_loop_designation + +const pass_data pass_data_oacc_gimple_workers = +{ + GIMPLE_PASS, /* type */ + "oaccworkers", /* name */ + OPTGROUP_OMP, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */ +}; + +class pass_oacc_gimple_workers : public gimple_opt_pass +{ +public: + pass_oacc_gimple_workers (gcc::context *ctxt) + : gimple_opt_pass (pass_data_oacc_gimple_workers, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return flag_openacc && targetm.goacc.worker_partitioning; + }; + + virtual unsigned int execute (function *) + { + return execute_oacc_gimple_workers (); + } + +}; // class pass_oacc_gimple_workers + const pass_data pass_data_oacc_device_lower = { GIMPLE_PASS, /* type */ @@ -2022,6 +2131,18 @@ public: } // anon namespace +gimple_opt_pass * +make_pass_oacc_loop_designation (gcc::context *ctxt) +{ + return new pass_oacc_loop_designation (ctxt); +} + +gimple_opt_pass * +make_pass_oacc_gimple_workers (gcc::context *ctxt) +{ + return new pass_oacc_gimple_workers (ctxt); +} + gimple_opt_pass * make_pass_oacc_device_lower (gcc::context *ctxt) { diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h index 21c9236b74f..b441854585f 100644 --- a/gcc/omp-offload.h +++ b/gcc/omp-offload.h @@ -29,6 +29,7 @@ extern int oacc_fn_attrib_level (tree attr); extern GTY(()) vec *offload_funcs; extern GTY(()) vec *offload_vars; +extern int oacc_fn_attrib_level (tree attr); extern void omp_finish_file (void); #endif /* GCC_OMP_DEVICE_H */ diff --git a/gcc/omp-sese.c b/gcc/omp-sese.c new file mode 100644 index 00000000000..cb2389eb55c --- /dev/null +++ b/gcc/omp-sese.c @@ -0,0 +1,2036 @@ +/* Find single-entry, single-exit regions for OpenACC. + Copyright (C) 2014-2017 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "cgraph.h" +#include "pretty-print.h" +#include "fold-const.h" +#include "gimplify.h" +#include "gimple-iterator.h" +#include "gimple-walk.h" +#include "tree-inline.h" +#include "langhooks.h" +#include "omp-general.h" +#include "omp-low.h" +#include "omp-grid.h" +#include "gimple-pretty-print.h" +#include "cfghooks.h" +#include "insn-config.h" +#include "recog.h" +#include "internal-fn.h" +#include "bitmap.h" +#include "tree-nested.h" +#include "stor-layout.h" +#include "tree-ssa-threadupdate.h" +#include "tree-into-ssa.h" +#include "splay-tree.h" +#include "target.h" +#include "cfgloop.h" +#include "tree-cfg.h" +#include "omp-offload.h" +#include "attribs.h" + +/* Loop structure of the function. The entire function is described as + a NULL loop. */ + +struct parallel +{ + /* Parent parallel. */ + parallel *parent; + + /* Next sibling parallel. */ + parallel *next; + + /* First child parallel. */ + parallel *inner; + + /* Partitioning mask of the parallel. */ + unsigned mask; + + /* Partitioning used within inner parallels. */ + unsigned inner_mask; + + /* Location of parallel forked and join. The forked is the first + block in the parallel and the join is the first block after of + the partition. */ + basic_block forked_block; + basic_block join_block; + + gimple *forked_stmt; + gimple *join_stmt; + + gimple *fork_stmt; + gimple *joining_stmt; + + /* Basic blocks in this parallel, but not in child parallels. The + FORKED and JOINING blocks are in the partition. The FORK and JOIN + blocks are not. */ + auto_vec blocks; + + tree record_type; + tree sender_decl; + tree receiver_decl; + +public: + parallel (parallel *parent, unsigned mode); + ~parallel (); +}; + +/* Constructor links the new parallel into it's parent's chain of + children. */ + +parallel::parallel (parallel *parent_, unsigned mask_) + :parent (parent_), next (0), inner (0), mask (mask_), inner_mask (0) +{ + forked_block = join_block = 0; + forked_stmt = join_stmt = NULL; + fork_stmt = joining_stmt = NULL; + + record_type = NULL_TREE; + sender_decl = NULL_TREE; + receiver_decl = NULL_TREE; + + if (parent) + { + next = parent->inner; + parent->inner = this; + } +} + +parallel::~parallel () +{ + delete inner; + delete next; +} + +static bool +local_var_based_p (tree decl) +{ + switch (TREE_CODE (decl)) + { + case VAR_DECL: + return !is_global_var (decl); + + case COMPONENT_REF: + case BIT_FIELD_REF: + case ARRAY_REF: + return local_var_based_p (TREE_OPERAND (decl, 0)); + + default: + return false; + } +} + +/* Map of basic blocks to gimple stmts. */ +typedef hash_map bb_stmt_map_t; + +/* Calls to OpenACC routines are made by all workers/wavefronts/warps, since + the routine likely contains partitioned loops (else will do its own + neutering and variable propagation). Return TRUE if a function call CALL + should be made in (worker) single mode instead, rather than redundant + mode. */ + +static bool +omp_sese_active_worker_call (gcall *call) +{ +#define GOMP_DIM_SEQ GOMP_DIM_MAX + tree fndecl = gimple_call_fndecl (call); + + if (!fndecl) + return true; + + tree attrs = oacc_get_fn_attrib (fndecl); + + if (!attrs) + return true; + + int level = oacc_fn_attrib_level (attrs); + + /* Neither regular functions nor "seq" routines should be run by all threads + in worker-single mode. */ + return level == -1 || level == GOMP_DIM_SEQ; +#undef GOMP_DIM_SEQ +} + +/* Split basic blocks such that each forked and join unspecs are at + the start of their basic blocks. Thus afterwards each block will + have a single partitioning mode. We also do the same for return + insns, as they are executed by every thread. Return the + partitioning mode of the function as a whole. Populate MAP with + head and tail blocks. We also clear the BB visited flag, which is + used when finding partitions. */ + +static void +omp_sese_split_blocks (bb_stmt_map_t *map) +{ + auto_vec worklist; + basic_block block; + + /* Locate all the reorg instructions of interest. */ + FOR_ALL_BB_FN (block, cfun) + { + /* Clear visited flag, for use by parallel locator */ + block->flags &= ~BB_VISITED; + + for (gimple_stmt_iterator gsi = gsi_start_bb (block); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (gimple_call_internal_p (stmt, IFN_UNIQUE)) + { + enum ifn_unique_kind k = ((enum ifn_unique_kind) + TREE_INT_CST_LOW (gimple_call_arg (stmt, 0))); + gcall *call = as_a (stmt); + + if (k == IFN_UNIQUE_OACC_JOIN) + worklist.safe_push (stmt); + else if (k == IFN_UNIQUE_OACC_FORK) + { + gcc_assert (gsi_one_before_end_p (gsi)); + basic_block forked_block = single_succ (block); + gimple_stmt_iterator gsi2 = gsi_start_bb (forked_block); + + /* We push a NOP as a placeholder for the "forked" stmt. + This is then recognized in omp_sese_find_par. */ + gimple *nop = gimple_build_nop (); + gsi_insert_before (&gsi2, nop, GSI_SAME_STMT); + + worklist.safe_push (nop); + } + } + else if (gimple_code (stmt) == GIMPLE_RETURN + || gimple_code (stmt) == GIMPLE_COND + || gimple_code (stmt) == GIMPLE_SWITCH + || (gimple_code (stmt) == GIMPLE_CALL + && !gimple_call_internal_p (stmt) + && !omp_sese_active_worker_call (as_a (stmt)))) + worklist.safe_push (stmt); + else if (is_gimple_assign (stmt)) + { + tree lhs = gimple_assign_lhs (stmt); + + /* Force assignments to components/fields/elements of local + aggregates into fully-partitioned (redundant) mode. This + avoids having to broadcast the whole aggregate. The RHS of + the assignment will be propagated using the normal + mechanism. */ + + switch (TREE_CODE (lhs)) + { + case COMPONENT_REF: + case BIT_FIELD_REF: + case ARRAY_REF: + { + tree aggr = TREE_OPERAND (lhs, 0); + + if (local_var_based_p (aggr)) + worklist.safe_push (stmt); + } + break; + + default: + ; + } + } + } + } + + /* Split blocks on the worklist. */ + unsigned ix; + gimple *stmt; + + for (ix = 0; worklist.iterate (ix, &stmt); ix++) + { + basic_block block = gimple_bb (stmt); + + if (gimple_code (stmt) == GIMPLE_COND) + { + gcond *orig_cond = as_a (stmt); + tree_code code = gimple_expr_code (orig_cond); + tree pred = make_ssa_name (boolean_type_node); + gimple *asgn = gimple_build_assign (pred, code, + gimple_cond_lhs (orig_cond), + gimple_cond_rhs (orig_cond)); + gcond *new_cond + = gimple_build_cond (NE_EXPR, pred, boolean_false_node, + gimple_cond_true_label (orig_cond), + gimple_cond_false_label (orig_cond)); + + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gsi_insert_before (&gsi, asgn, GSI_SAME_STMT); + gsi_replace (&gsi, new_cond, true); + + edge e = split_block (block, asgn); + block = e->dest; + map->get_or_insert (block) = new_cond; + } + else if ((gimple_code (stmt) == GIMPLE_CALL + && !gimple_call_internal_p (stmt)) + || is_gimple_assign (stmt)) + { + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gsi_prev (&gsi); + + edge call = split_block (block, gsi_stmt (gsi)); + + gimple *call_stmt = gsi_stmt (gsi_start_bb (call->dest)); + + edge call_to_ret = split_block (call->dest, call_stmt); + + map->get_or_insert (call_to_ret->src) = call_stmt; + } + else + { + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gsi_prev (&gsi); + + if (gsi_end_p (gsi)) + map->get_or_insert (block) = stmt; + else + { + /* Split block before insn. The insn is in the new block. */ + edge e = split_block (block, gsi_stmt (gsi)); + + block = e->dest; + map->get_or_insert (block) = stmt; + } + } + } +} + +static const char * +mask_name (unsigned mask) +{ + switch (mask) + { + case 0: return "gang redundant"; + case 1: return "gang partitioned"; + case 2: return "worker partitioned"; + case 3: return "gang+worker partitioned"; + case 4: return "vector partitioned"; + case 5: return "gang+vector partitioned"; + case 6: return "worker+vector partitioned"; + case 7: return "fully partitioned"; + default: return ""; + } +} + +/* Dump this parallel and all its inner parallels. */ + +static void +omp_sese_dump_pars (parallel *par, unsigned depth) +{ + fprintf (dump_file, "%u: mask %d (%s) head=%d, tail=%d\n", + depth, par->mask, mask_name (par->mask), + par->forked_block ? par->forked_block->index : -1, + par->join_block ? par->join_block->index : -1); + + fprintf (dump_file, " blocks:"); + + basic_block block; + for (unsigned ix = 0; par->blocks.iterate (ix, &block); ix++) + fprintf (dump_file, " %d", block->index); + fprintf (dump_file, "\n"); + if (par->inner) + omp_sese_dump_pars (par->inner, depth + 1); + + if (par->next) + omp_sese_dump_pars (par->next, depth); +} + +/* If BLOCK contains a fork/join marker, process it to create or + terminate a loop structure. Add this block to the current loop, + and then walk successor blocks. */ + +static parallel * +omp_sese_find_par (bb_stmt_map_t *map, parallel *par, basic_block block) +{ + if (block->flags & BB_VISITED) + return par; + block->flags |= BB_VISITED; + + if (gimple **stmtp = map->get (block)) + { + gimple *stmt = *stmtp; + + if (gimple_code (stmt) == GIMPLE_COND + || gimple_code (stmt) == GIMPLE_SWITCH + || gimple_code (stmt) == GIMPLE_RETURN + || (gimple_code (stmt) == GIMPLE_CALL + && !gimple_call_internal_p (stmt)) + || is_gimple_assign (stmt)) + { + /* A single block that is forced to be at the maximum partition + level. Make a singleton par for it. */ + par = new parallel (par, GOMP_DIM_MASK (GOMP_DIM_GANG) + | GOMP_DIM_MASK (GOMP_DIM_WORKER) + | GOMP_DIM_MASK (GOMP_DIM_VECTOR)); + par->forked_block = block; + par->forked_stmt = stmt; + par->blocks.safe_push (block); + par = par->parent; + goto walk_successors; + } + else if (gimple_nop_p (stmt)) + { + basic_block pred = single_pred (block); + gcc_assert (pred); + gimple_stmt_iterator gsi = gsi_last_bb (pred); + gimple *final_stmt = gsi_stmt (gsi); + + if (gimple_call_internal_p (final_stmt, IFN_UNIQUE)) + { + gcall *call = as_a (final_stmt); + enum ifn_unique_kind k = ((enum ifn_unique_kind) + TREE_INT_CST_LOW (gimple_call_arg (call, 0))); + + if (k == IFN_UNIQUE_OACC_FORK) + { + HOST_WIDE_INT dim + = TREE_INT_CST_LOW (gimple_call_arg (call, 2)); + unsigned mask = (dim >= 0) ? GOMP_DIM_MASK (dim) : 0; + + par = new parallel (par, mask); + par->forked_block = block; + par->forked_stmt = final_stmt; + par->fork_stmt = stmt; + } + else + gcc_unreachable (); + } + else + gcc_unreachable (); + } + else if (gimple_call_internal_p (stmt, IFN_UNIQUE)) + { + gcall *call = as_a (stmt); + enum ifn_unique_kind k = ((enum ifn_unique_kind) + TREE_INT_CST_LOW (gimple_call_arg (call, 0))); + if (k == IFN_UNIQUE_OACC_JOIN) + { + HOST_WIDE_INT dim = TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); + unsigned mask = (dim >= 0) ? GOMP_DIM_MASK (dim) : 0; + + gcc_assert (par->mask == mask); + par->join_block = block; + par->join_stmt = stmt; + par = par->parent; + } + else + gcc_unreachable (); + } + else + gcc_unreachable (); + } + + if (par) + /* Add this block onto the current loop's list of blocks. */ + par->blocks.safe_push (block); + else + /* This must be the entry block. Create a NULL parallel. */ + par = new parallel (0, 0); + +walk_successors: + /* Walk successor blocks. */ + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, block->succs) + omp_sese_find_par (map, par, e->dest); + + return par; +} + +/* DFS walk the CFG looking for fork & join markers. Construct + loop structures as we go. MAP is a mapping of basic blocks + to head & tail markers, discovered when splitting blocks. This + speeds up the discovery. We rely on the BB visited flag having + been cleared when splitting blocks. */ + +static parallel * +omp_sese_discover_pars (bb_stmt_map_t *map) +{ + basic_block block; + + /* Mark exit blocks as visited. */ + block = EXIT_BLOCK_PTR_FOR_FN (cfun); + block->flags |= BB_VISITED; + + /* And entry block as not. */ + block = ENTRY_BLOCK_PTR_FOR_FN (cfun); + block->flags &= ~BB_VISITED; + + parallel *par = omp_sese_find_par (map, 0, block); + + if (dump_file) + { + fprintf (dump_file, "\nLoops\n"); + omp_sese_dump_pars (par, 0); + fprintf (dump_file, "\n"); + } + + return par; +} + +static void +populate_single_mode_bitmaps (parallel *par, bitmap worker_single, + bitmap vector_single, unsigned outer_mask, + int depth) +{ + unsigned mask = outer_mask | par->mask; + + basic_block block; + + for (unsigned i = 0; par->blocks.iterate (i, &block); i++) + { + if ((mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) == 0) + bitmap_set_bit (worker_single, block->index); + + if ((mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)) == 0) + bitmap_set_bit (vector_single, block->index); + } + + if (par->inner) + populate_single_mode_bitmaps (par->inner, worker_single, vector_single, + mask, depth + 1); + if (par->next) + populate_single_mode_bitmaps (par->next, worker_single, vector_single, + outer_mask, depth); +} + +/* A map from SSA names or var decls to record fields. */ + +typedef hash_map field_map_t; + +/* For each propagation record type, this is a map from SSA names or var decls + to propagate, to the field in the record type that should be used for + transmission and reception. */ + +typedef hash_map record_field_map_t; + +static GTY(()) record_field_map_t *field_map; + +static void +install_var_field (tree var, tree record_type) +{ + field_map_t *fields = *field_map->get (record_type); + tree name; + char tmp[20]; + + if (TREE_CODE (var) == SSA_NAME) + name = SSA_NAME_IDENTIFIER (var); + else if (TREE_CODE (var) == VAR_DECL) + name = DECL_NAME (var); + else + gcc_unreachable (); + + gcc_assert (!fields->get (var)); + + if (!name) + { + sprintf (tmp, "_%u", (unsigned) SSA_NAME_VERSION (var)); + name = get_identifier (tmp); + } + + tree type = TREE_TYPE (var); + + if (POINTER_TYPE_P (type) + && TYPE_RESTRICT (type)) + type = build_qualified_type (type, TYPE_QUALS (type) & ~TYPE_QUAL_RESTRICT); + + tree field = build_decl (BUILTINS_LOCATION, FIELD_DECL, name, type); + + if (TREE_CODE (var) == VAR_DECL && type == TREE_TYPE (var)) + { + SET_DECL_ALIGN (field, DECL_ALIGN (var)); + DECL_USER_ALIGN (field) = DECL_USER_ALIGN (var); + TREE_THIS_VOLATILE (field) = TREE_THIS_VOLATILE (var); + } + else + SET_DECL_ALIGN (field, TYPE_ALIGN (type)); + + fields->put (var, field); + + insert_field_into_struct (record_type, field); +} + +/* Sets of SSA_NAMES or VAR_DECLs to propagate. */ +typedef hash_set propagation_set; + +static void +find_ssa_names_to_propagate (parallel *par, unsigned outer_mask, + bitmap worker_single, bitmap vector_single, + vec *prop_set) +{ + unsigned mask = outer_mask | par->mask; + + if (par->inner) + find_ssa_names_to_propagate (par->inner, mask, worker_single, + vector_single, prop_set); + if (par->next) + find_ssa_names_to_propagate (par->next, outer_mask, worker_single, + vector_single, prop_set); + + if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) + { + basic_block block; + int ix; + + for (ix = 0; par->blocks.iterate (ix, &block); ix++) + { + for (gphi_iterator psi = gsi_start_phis (block); + !gsi_end_p (psi); gsi_next (&psi)) + { + gphi *phi = psi.phi (); + use_operand_p use; + ssa_op_iter iter; + + FOR_EACH_PHI_ARG (use, phi, iter, SSA_OP_USE) + { + tree var = USE_FROM_PTR (use); + + if (TREE_CODE (var) != SSA_NAME) + continue; + + gimple *def_stmt = SSA_NAME_DEF_STMT (var); + + if (gimple_nop_p (def_stmt)) + continue; + + basic_block def_bb = gimple_bb (def_stmt); + + if (bitmap_bit_p (worker_single, def_bb->index)) + { + if (!(*prop_set)[def_bb->index]) + (*prop_set)[def_bb->index] = new propagation_set; + + propagation_set *ws_prop = (*prop_set)[def_bb->index]; + + ws_prop->add (var); + } + } + } + + for (gimple_stmt_iterator gsi = gsi_start_bb (block); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + use_operand_p use; + ssa_op_iter iter; + gimple *stmt = gsi_stmt (gsi); + + FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE) + { + tree var = USE_FROM_PTR (use); + + gimple *def_stmt = SSA_NAME_DEF_STMT (var); + + if (gimple_nop_p (def_stmt)) + continue; + + basic_block def_bb = gimple_bb (def_stmt); + + if (bitmap_bit_p (worker_single, def_bb->index)) + { + if (!(*prop_set)[def_bb->index]) + (*prop_set)[def_bb->index] = new propagation_set; + + propagation_set *ws_prop = (*prop_set)[def_bb->index]; + + ws_prop->add (var); + } + } + } + } + } +} + +/* Callback for walk_gimple_stmt to find RHS VAR_DECLs (uses) in a + statement. */ + +static tree +find_partitioned_var_uses_1 (tree *node, int *, void *data) +{ + walk_stmt_info *wi = (walk_stmt_info *) data; + hash_set *partitioned_var_uses = (hash_set *) wi->info; + + if (!wi->is_lhs && VAR_P (*node)) + partitioned_var_uses->add (*node); + + return NULL_TREE; +} + +static void +find_partitioned_var_uses (parallel *par, unsigned outer_mask, + hash_set *partitioned_var_uses) +{ + unsigned mask = outer_mask | par->mask; + + if (par->inner) + find_partitioned_var_uses (par->inner, mask, partitioned_var_uses); + if (par->next) + find_partitioned_var_uses (par->next, outer_mask, partitioned_var_uses); + + if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) + { + basic_block block; + int ix; + + for (ix = 0; par->blocks.iterate (ix, &block); ix++) + for (gimple_stmt_iterator gsi = gsi_start_bb (block); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + walk_stmt_info wi; + memset (&wi, 0, sizeof (wi)); + wi.info = (void *) partitioned_var_uses; + walk_gimple_stmt (&gsi, NULL, find_partitioned_var_uses_1, &wi); + } + } +} + +static void +find_local_vars_to_propagate (parallel *par, unsigned outer_mask, + hash_set *partitioned_var_uses, + vec *prop_set) +{ + unsigned mask = outer_mask | par->mask; + + if (par->inner) + find_local_vars_to_propagate (par->inner, mask, partitioned_var_uses, + prop_set); + if (par->next) + find_local_vars_to_propagate (par->next, outer_mask, partitioned_var_uses, + prop_set); + + if (!(mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))) + { + basic_block block; + int ix; + + for (ix = 0; par->blocks.iterate (ix, &block); ix++) + { + for (gimple_stmt_iterator gsi = gsi_start_bb (block); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + tree var; + unsigned i; + + FOR_EACH_LOCAL_DECL (cfun, i, var) + { + if (!VAR_P (var) + || is_global_var (var) + || AGGREGATE_TYPE_P (TREE_TYPE (var)) + || !partitioned_var_uses->contains (var) + || lookup_attribute ("oacc gangprivate", + DECL_ATTRIBUTES (var))) + continue; + + if (stmt_may_clobber_ref_p (stmt, var)) + { + if (dump_file) + { + fprintf (dump_file, "bb %u: local variable may be " + "clobbered in %s mode: ", block->index, + mask_name (mask)); + print_generic_expr (dump_file, var, TDF_SLIM); + fprintf (dump_file, "\n"); + } + + if (!(*prop_set)[block->index]) + (*prop_set)[block->index] = new propagation_set; + + propagation_set *ws_prop + = (*prop_set)[block->index]; + + ws_prop->add (var); + } + } + } + } + } +} + +/* Transform basic blocks FROM, TO (which may be the same block) into: + if (GOACC_single_start ()) + BLOCK; + GOACC_barrier (); + \ | / + +----+ + | | (new) predicate block + +----+-- + \ | / \ | / |t \ + +----+ +----+ +----+ | + | | | | ===> | | | f (old) from block + +----+ +----+ +----+ | + | t/ \f | / + +----+/ + (split (split before | | skip block + at end) condition) +----+ + t/ \f +*/ + +static void +worker_single_simple (basic_block from, basic_block to, + hash_set *def_escapes_block) +{ + gimple *call, *cond; + tree lhs, decl; + basic_block skip_block; + + gimple_stmt_iterator gsi = gsi_last_bb (to); + if (EDGE_COUNT (to->succs) > 1) + { + gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_COND); + gsi_prev (&gsi); + } + edge e = split_block (to, gsi_stmt (gsi)); + skip_block = e->dest; + + gimple_stmt_iterator start = gsi_after_labels (from); + + decl = builtin_decl_explicit (BUILT_IN_GOACC_SINGLE_START); + lhs = create_tmp_var (TREE_TYPE (TREE_TYPE (decl))); + call = gimple_build_call (decl, 0); + gimple_call_set_lhs (call, lhs); + gsi_insert_before (&start, call, GSI_NEW_STMT); + update_stmt (call); + + cond = gimple_build_cond (EQ_EXPR, lhs, + fold_convert_loc (UNKNOWN_LOCATION, + TREE_TYPE (lhs), + boolean_true_node), + NULL_TREE, NULL_TREE); + gsi_insert_after (&start, cond, GSI_NEW_STMT); + update_stmt (cond); + + edge et = split_block (from, cond); + et->flags &= ~EDGE_FALLTHRU; + et->flags |= EDGE_TRUE_VALUE; + /* Make the active worker the more probable path so we prefer fallthrough + (letting the idle workers jump around more). */ + et->probability = profile_probability::likely (); + + edge ef = make_edge (from, skip_block, EDGE_FALSE_VALUE); + ef->probability = et->probability.invert (); + + basic_block neutered = split_edge (ef); + gimple_stmt_iterator neut_gsi = gsi_last_bb (neutered); + + for (gsi = gsi_start_bb (et->dest); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + ssa_op_iter iter; + tree var; + + FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF) + { + if (def_escapes_block->contains (var)) + { + gphi *join_phi = create_phi_node (NULL_TREE, skip_block); + tree res = create_new_def_for (var, join_phi, + gimple_phi_result_ptr (join_phi)); + add_phi_arg (join_phi, var, e, UNKNOWN_LOCATION); + + tree neutered_def = copy_ssa_name (var, NULL); + /* We really want "don't care" or some value representing + undefined here, but optimizers will probably get rid of the + zero-assignments anyway. */ + gassign *zero = gimple_build_assign (neutered_def, + build_zero_cst (TREE_TYPE (neutered_def))); + + gsi_insert_after (&neut_gsi, zero, GSI_CONTINUE_LINKING); + update_stmt (zero); + + add_phi_arg (join_phi, neutered_def, single_succ_edge (neutered), + UNKNOWN_LOCATION); + update_stmt (join_phi); + } + } + } + + gsi = gsi_start_bb (skip_block); + + decl = builtin_decl_explicit (BUILT_IN_GOACC_BARRIER); + gimple *acc_bar = gimple_build_call (decl, 0); + + gsi_insert_before (&gsi, acc_bar, GSI_SAME_STMT); + update_stmt (acc_bar); +} + +/* This is a copied and renamed omp-low.c:omp_build_component_ref. */ + +static tree +oacc_build_component_ref (tree obj, tree field) +{ + tree ret = build3 (COMPONENT_REF, TREE_TYPE (field), obj, field, NULL); + if (TREE_THIS_VOLATILE (field)) + TREE_THIS_VOLATILE (ret) |= 1; + if (TREE_READONLY (field)) + TREE_READONLY (ret) |= 1; + return ret; +} + +static tree +build_receiver_ref (tree record_type, tree var, tree receiver_decl) +{ + field_map_t *fields = *field_map->get (record_type); + tree x = build_simple_mem_ref (receiver_decl); + tree field = *fields->get (var); + TREE_THIS_NOTRAP (x) = 1; + x = oacc_build_component_ref (x, field); + return x; +} + +static tree +build_sender_ref (tree record_type, tree var, tree sender_decl) +{ + field_map_t *fields = *field_map->get (record_type); + tree field = *fields->get (var); + return oacc_build_component_ref (sender_decl, field); +} + +static int +sort_by_ssa_version_or_uid (const void *p1, const void *p2) +{ + const tree t1 = *(const tree *)p1; + const tree t2 = *(const tree *)p2; + + if (TREE_CODE (t1) == SSA_NAME && TREE_CODE (t2) == SSA_NAME) + return SSA_NAME_VERSION (t1) - SSA_NAME_VERSION (t2); + else if (TREE_CODE (t1) == SSA_NAME && TREE_CODE (t2) != SSA_NAME) + return -1; + else if (TREE_CODE (t1) != SSA_NAME && TREE_CODE (t2) == SSA_NAME) + return 1; + else + return DECL_UID (t1) - DECL_UID (t2); +} + +static int +sort_by_size_then_ssa_version_or_uid (const void *p1, const void *p2) +{ + const tree t1 = *(const tree *)p1; + const tree t2 = *(const tree *)p2; + unsigned HOST_WIDE_INT s1 = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (t1))); + unsigned HOST_WIDE_INT s2 = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (t2))); + if (s1 != s2) + return s2 - s1; + else + return sort_by_ssa_version_or_uid (p1, p2); +} + +static void +worker_single_copy (basic_block from, basic_block to, + hash_set *def_escapes_block, + hash_set *worker_partitioned_uses, + tree record_type) +{ + /* If we only have virtual defs, we'll have no record type, but we still want + to emit single_copy_start and (particularly) single_copy_end to act as + a vdef source on the neutered edge representing memory writes on the + non-neutered edge. */ + if (!record_type) + record_type = char_type_node; + + tree sender_decl + = targetm.goacc.create_propagation_record (record_type, true, + ".oacc_worker_o"); + tree receiver_decl + = targetm.goacc.create_propagation_record (record_type, false, + ".oacc_worker_i"); + + gimple_stmt_iterator gsi = gsi_last_bb (to); + if (EDGE_COUNT (to->succs) > 1) + gsi_prev (&gsi); + edge e = split_block (to, gsi_stmt (gsi)); + basic_block barrier_block = e->dest; + + gimple_stmt_iterator start = gsi_after_labels (from); + + tree decl = builtin_decl_explicit (BUILT_IN_GOACC_SINGLE_COPY_START); + + tree lhs = create_tmp_var (TREE_TYPE (TREE_TYPE (decl))); + + gimple *call = gimple_build_call (decl, 1, + build_fold_addr_expr (sender_decl)); + gimple_call_set_lhs (call, lhs); + gsi_insert_before (&start, call, GSI_NEW_STMT); + update_stmt (call); + + tree conv_tmp = make_ssa_name (TREE_TYPE (receiver_decl)); + + gimple *conv = gimple_build_assign (conv_tmp, + fold_convert (TREE_TYPE (receiver_decl), + lhs)); + update_stmt (conv); + gsi_insert_after (&start, conv, GSI_NEW_STMT); + gimple *asgn = gimple_build_assign (receiver_decl, conv_tmp); + gsi_insert_after (&start, asgn, GSI_NEW_STMT); + update_stmt (asgn); + + tree zero_ptr = build_int_cst (TREE_TYPE (receiver_decl), 0); + + tree recv_tmp = make_ssa_name (TREE_TYPE (receiver_decl)); + asgn = gimple_build_assign (recv_tmp, receiver_decl); + gsi_insert_after (&start, asgn, GSI_NEW_STMT); + update_stmt (asgn); + + gimple *cond = gimple_build_cond (EQ_EXPR, recv_tmp, zero_ptr, NULL_TREE, + NULL_TREE); + update_stmt (cond); + + gsi_insert_after (&start, cond, GSI_NEW_STMT); + + edge et = split_block (from, cond); + et->flags &= ~EDGE_FALLTHRU; + et->flags |= EDGE_TRUE_VALUE; + /* Make the active worker the more probable path so we prefer fallthrough + (letting the idle workers jump around more). */ + et->probability = profile_probability::likely (); + + basic_block body = et->dest; + + edge ef = make_edge (from, barrier_block, EDGE_FALSE_VALUE); + ef->probability = et->probability.invert (); + + decl = builtin_decl_explicit (BUILT_IN_GOACC_BARRIER); + gimple *acc_bar = gimple_build_call (decl, 0); + + gimple_stmt_iterator bar_gsi = gsi_start_bb (barrier_block); + gsi_insert_before (&bar_gsi, acc_bar, GSI_NEW_STMT); + + cond = gimple_build_cond (NE_EXPR, recv_tmp, zero_ptr, NULL_TREE, NULL_TREE); + gsi_insert_after (&bar_gsi, cond, GSI_NEW_STMT); + + edge et2 = split_block (barrier_block, cond); + et2->flags &= ~EDGE_FALLTHRU; + et2->flags |= EDGE_TRUE_VALUE; + et2->probability = profile_probability::unlikely (); + + basic_block exit_block = et2->dest; + + basic_block copyout_block = split_edge (et2); + edge ef2 = make_edge (barrier_block, exit_block, EDGE_FALSE_VALUE); + ef2->probability = et2->probability.invert (); + + gimple_stmt_iterator copyout_gsi = gsi_start_bb (copyout_block); + + edge copyout_to_exit = single_succ_edge (copyout_block); + + gimple_seq sender_seq = NULL; + + /* Make sure we iterate over definitions in a stable order. */ + auto_vec escape_vec (def_escapes_block->elements ()); + for (hash_set::iterator it = def_escapes_block->begin (); + it != def_escapes_block->end (); ++it) + escape_vec.quick_push (*it); + escape_vec.qsort (sort_by_ssa_version_or_uid); + + for (unsigned i = 0; i < escape_vec.length (); i++) + { + tree var = escape_vec[i]; + + if (TREE_CODE (var) == SSA_NAME && SSA_NAME_IS_VIRTUAL_OPERAND (var)) + continue; + + tree barrier_def = 0; + + if (TREE_CODE (var) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (var); + + if (gimple_nop_p (def_stmt)) + continue; + + /* The barrier phi takes one result from the actual work of the + block we're neutering, and the other result is constant zero of + the same type. */ + + gphi *barrier_phi = create_phi_node (NULL_TREE, barrier_block); + barrier_def = create_new_def_for (var, barrier_phi, + gimple_phi_result_ptr (barrier_phi)); + + add_phi_arg (barrier_phi, var, e, UNKNOWN_LOCATION); + add_phi_arg (barrier_phi, build_zero_cst (TREE_TYPE (var)), ef, + UNKNOWN_LOCATION); + + update_stmt (barrier_phi); + } + else + gcc_assert (TREE_CODE (var) == VAR_DECL); + + /* If we had no record type, we will have no fields map. */ + field_map_t **fields_p = field_map->get (record_type); + field_map_t *fields = fields_p ? *fields_p : NULL; + + if (worker_partitioned_uses->contains (var) + && fields + && fields->get (var)) + { + tree neutered_def = make_ssa_name (TREE_TYPE (var)); + + /* Receive definition from shared memory block. */ + + tree receiver_ref = build_receiver_ref (record_type, var, + receiver_decl); + gassign *recv = gimple_build_assign (neutered_def, + receiver_ref); + gsi_insert_after (©out_gsi, recv, GSI_CONTINUE_LINKING); + update_stmt (recv); + + if (TREE_CODE (var) == VAR_DECL) + { + /* If it's a VAR_DECL, we only copied to an SSA temporary. Copy + to the final location now. */ + gassign *asgn = gimple_build_assign (var, neutered_def); + gsi_insert_after (©out_gsi, asgn, GSI_CONTINUE_LINKING); + update_stmt (asgn); + } + else + { + /* If it's an SSA name, create a new phi at the join node to + represent either the output from the active worker (the + barrier) or the inactive workers (the copyout block). */ + gphi *join_phi = create_phi_node (NULL_TREE, exit_block); + create_new_def_for (barrier_def, join_phi, + gimple_phi_result_ptr (join_phi)); + add_phi_arg (join_phi, barrier_def, ef2, UNKNOWN_LOCATION); + add_phi_arg (join_phi, neutered_def, copyout_to_exit, + UNKNOWN_LOCATION); + update_stmt (join_phi); + } + + /* Send definition to shared memory block. */ + + tree sender_ref = build_sender_ref (record_type, var, sender_decl); + + if (TREE_CODE (var) == SSA_NAME) + { + gassign *send = gimple_build_assign (sender_ref, var); + gimple_seq_add_stmt (&sender_seq, send); + update_stmt (send); + } + else if (TREE_CODE (var) == VAR_DECL) + { + tree tmp = make_ssa_name (TREE_TYPE (var)); + gassign *send = gimple_build_assign (tmp, var); + gimple_seq_add_stmt (&sender_seq, send); + update_stmt (send); + send = gimple_build_assign (sender_ref, tmp); + gimple_seq_add_stmt (&sender_seq, send); + update_stmt (send); + } + else + gcc_unreachable (); + } + } + + /* It's possible for the ET->DEST block (the work done by the active thread) + to finish with a control-flow insn, e.g. a UNIQUE function call. Split + the block and add SENDER_SEQ in the latter part to avoid having control + flow in the middle of a BB. */ + + decl = builtin_decl_explicit (BUILT_IN_GOACC_SINGLE_COPY_END); + call = gimple_build_call (decl, 1, build_fold_addr_expr (sender_decl)); + gimple_seq_add_stmt (&sender_seq, call); + + gsi = gsi_last_bb (body); + gimple *last = gsi_stmt (gsi); + basic_block sender_block = split_block (body, last)->dest; + gsi = gsi_last_bb (sender_block); + gsi_insert_seq_after (&gsi, sender_seq, GSI_CONTINUE_LINKING); +} + +static void +neuter_worker_single (parallel *par, unsigned outer_mask, bitmap worker_single, + bitmap vector_single, vec *prop_set, + hash_set *partitioned_var_uses) +{ + unsigned mask = outer_mask | par->mask; + + if ((mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) == 0) + { + basic_block block; + + for (unsigned i = 0; par->blocks.iterate (i, &block); i++) + { + bool has_defs = false; + hash_set def_escapes_block; + hash_set worker_partitioned_uses; + unsigned j; + tree var; + + FOR_EACH_SSA_NAME (j, var, cfun) + { + if (SSA_NAME_IS_VIRTUAL_OPERAND (var)) + { + has_defs = true; + continue; + } + + gimple *def_stmt = SSA_NAME_DEF_STMT (var); + + if (gimple_nop_p (def_stmt)) + continue; + + if (gimple_bb (def_stmt)->index != block->index) + continue; + + gimple *use_stmt; + imm_use_iterator use_iter; + bool uses_outside_block = false; + bool worker_partitioned_use = false; + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, var) + { + int blocknum = gimple_bb (use_stmt)->index; + + /* Don't propagate SSA names that are only used in the + current block, unless the usage is in a phi node: that + means the name left the block, then came back in at the + top. */ + if (blocknum != block->index + || gimple_code (use_stmt) == GIMPLE_PHI) + uses_outside_block = true; + if (!bitmap_bit_p (worker_single, blocknum)) + worker_partitioned_use = true; + } + + if (uses_outside_block) + def_escapes_block.add (var); + + if (worker_partitioned_use) + { + worker_partitioned_uses.add (var); + has_defs = true; + } + } + + propagation_set *ws_prop = (*prop_set)[block->index]; + + if (ws_prop) + { + for (propagation_set::iterator it = ws_prop->begin (); + it != ws_prop->end (); + ++it) + { + tree var = *it; + if (TREE_CODE (var) == VAR_DECL) + { + def_escapes_block.add (var); + if (partitioned_var_uses->contains (var)) + { + worker_partitioned_uses.add (var); + has_defs = true; + } + } + } + + delete ws_prop; + (*prop_set)[block->index] = 0; + } + + tree record_type = (tree) block->aux; + + if (has_defs) + worker_single_copy (block, block, &def_escapes_block, + &worker_partitioned_uses, record_type); + else + worker_single_simple (block, block, &def_escapes_block); + } + } + + if ((outer_mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) == 0) + { + basic_block block; + + for (unsigned i = 0; par->blocks.iterate (i, &block); i++) + for (gimple_stmt_iterator gsi = gsi_start_bb (block); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) == GIMPLE_CALL + && !gimple_call_internal_p (stmt) + && !omp_sese_active_worker_call (as_a (stmt))) + { + /* If we have an OpenACC routine call in worker-single mode, + place barriers before and afterwards to prevent + clobbering re-used shared memory regions (as are used + for AMDGCN at present, for example). */ + tree decl = builtin_decl_explicit (BUILT_IN_GOACC_BARRIER); + gsi_insert_before (&gsi, gimple_build_call (decl, 0), + GSI_SAME_STMT); + gsi_insert_after (&gsi, gimple_build_call (decl, 0), + GSI_NEW_STMT); + } + } + } + + if (par->inner) + neuter_worker_single (par->inner, mask, worker_single, vector_single, + prop_set, partitioned_var_uses); + if (par->next) + neuter_worker_single (par->next, outer_mask, worker_single, vector_single, + prop_set, partitioned_var_uses); +} + + +void +oacc_do_neutering (void) +{ + bb_stmt_map_t bb_stmt_map; + auto_bitmap worker_single, vector_single; + + omp_sese_split_blocks (&bb_stmt_map); + + if (dump_file) + { + fprintf (dump_file, "\n\nAfter splitting:\n\n"); + dump_function_to_file (current_function_decl, dump_file, dump_flags); + } + + unsigned mask = 0; + + /* If this is a routine, calculate MASK as if the outer levels are already + partitioned. */ + tree attr = oacc_get_fn_attrib (current_function_decl); + if (attr) + { + tree dims = TREE_VALUE (attr); + unsigned ix; + for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims)) + { + tree allowed = TREE_PURPOSE (dims); + if (allowed && integer_zerop (allowed)) + mask |= GOMP_DIM_MASK (ix); + } + } + + parallel *par = omp_sese_discover_pars (&bb_stmt_map); + populate_single_mode_bitmaps (par, worker_single, vector_single, mask, 0); + + basic_block bb; + FOR_ALL_BB_FN (bb, cfun) + bb->aux = NULL; + + field_map = record_field_map_t::create_ggc (40); + + vec prop_set; + prop_set.create (last_basic_block_for_fn (cfun)); + + for (unsigned i = 0; i < last_basic_block_for_fn (cfun); i++) + prop_set.quick_push (0); + + find_ssa_names_to_propagate (par, mask, worker_single, vector_single, + &prop_set); + + hash_set partitioned_var_uses; + + find_partitioned_var_uses (par, mask, &partitioned_var_uses); + find_local_vars_to_propagate (par, mask, &partitioned_var_uses, &prop_set); + + FOR_ALL_BB_FN (bb, cfun) + { + propagation_set *ws_prop = prop_set[bb->index]; + if (ws_prop) + { + tree record_type = lang_hooks.types.make_type (RECORD_TYPE); + tree name = create_tmp_var_name (".oacc_ws_data_s"); + name = build_decl (UNKNOWN_LOCATION, TYPE_DECL, name, record_type); + DECL_ARTIFICIAL (name) = 1; + DECL_NAMELESS (name) = 1; + TYPE_NAME (record_type) = name; + TYPE_ARTIFICIAL (record_type) = 1; + + auto_vec field_vec (ws_prop->elements ()); + for (hash_set::iterator it = ws_prop->begin (); + it != ws_prop->end (); ++it) + field_vec.quick_push (*it); + + field_vec.qsort (sort_by_size_then_ssa_version_or_uid); + + field_map->put (record_type, field_map_t::create_ggc (17)); + + /* Insert var fields in reverse order, so the last inserted element + is the first in the structure. */ + for (int i = field_vec.length () - 1; i >= 0; i--) + install_var_field (field_vec[i], record_type); + + layout_type (record_type); + + bb->aux = (tree) record_type; + } + } + + neuter_worker_single (par, mask, worker_single, vector_single, &prop_set, + &partitioned_var_uses); + + prop_set.release (); + + /* This doesn't seem to make a difference. */ + loops_state_clear (LOOP_CLOSED_SSA); + + /* Neutering worker-single neutered blocks will invalidate dominance info. + It may be possible to incrementally update just the affected blocks, but + obliterate everything for now. */ + free_dominance_info (CDI_DOMINATORS); + free_dominance_info (CDI_POST_DOMINATORS); + + if (dump_file) + { + fprintf (dump_file, "\n\nAfter neutering:\n\n"); + dump_function_to_file (current_function_decl, dump_file, dump_flags); + } +} + +/* Analyse a group of BBs within a partitioned region and create N + Single-Entry-Single-Exit regions. Some of those regions will be + trivial ones consisting of a single BB. The blocks of a + partitioned region might form a set of disjoint graphs -- because + the region encloses a differently partitoned sub region. + + We use the linear time algorithm described in 'Finding Regions Fast: + Single Entry Single Exit and control Regions in Linear Time' + Johnson, Pearson & Pingali. That algorithm deals with complete + CFGs, where a back edge is inserted from END to START, and thus the + problem becomes one of finding equivalent loops. + + In this case we have a partial CFG. We complete it by redirecting + any incoming edge to the graph to be from an arbitrary external BB, + and similarly redirecting any outgoing edge to be to that BB. + Thus we end up with a closed graph. + + The algorithm works by building a spanning tree of an undirected + graph and keeping track of back edges from nodes further from the + root in the tree to nodes nearer to the root in the tree. In the + description below, the root is up and the tree grows downwards. + + We avoid having to deal with degenerate back-edges to the same + block, by splitting each BB into 3 -- one for input edges, one for + the node itself and one for the output edges. Such back edges are + referred to as 'Brackets'. Cycle equivalent nodes will have the + same set of brackets. + + Determining bracket equivalency is done by maintaining a list of + brackets in such a manner that the list length and final bracket + uniquely identify the set. + + We use coloring to mark all BBs with cycle equivalency with the + same color. This is the output of the 'Finding Regions Fast' + algorithm. Notice it doesn't actually find the set of nodes within + a particular region, just unorderd sets of nodes that are the + entries and exits of SESE regions. + + After determining cycle equivalency, we need to find the minimal + set of SESE regions. Do this with a DFS coloring walk of the + complete graph. We're either 'looking' or 'coloring'. When + looking, and we're in the subgraph, we start coloring the color of + the current node, and remember that node as the start of the + current color's SESE region. Every time we go to a new node, we + decrement the count of nodes with thet color. If it reaches zero, + we remember that node as the end of the current color's SESE region + and return to 'looking'. Otherwise we color the node the current + color. + + This way we end up with coloring the inside of non-trivial SESE + regions with the color of that region. */ + +/* A pair of BBs. We use this to represent SESE regions. */ +typedef std::pair bb_pair_t; +typedef auto_vec bb_pair_vec_t; + +/* A node in the undirected CFG. The discriminator SECOND indicates just + above or just below the BB idicated by FIRST. */ +typedef std::pair pseudo_node_t; + +/* A bracket indicates an edge towards the root of the spanning tree of the + undirected graph. Each bracket has a color, determined + from the currrent set of brackets. */ +struct bracket +{ + pseudo_node_t back; /* Back target */ + + /* Current color and size of set. */ + unsigned color; + unsigned size; + + bracket (pseudo_node_t back_) + : back (back_), color (~0u), size (~0u) + { + } + + unsigned get_color (auto_vec &color_counts, unsigned length) + { + if (length != size) + { + size = length; + color = color_counts.length (); + color_counts.quick_push (0); + } + color_counts[color]++; + return color; + } +}; + +typedef auto_vec bracket_vec_t; + +/* Basic block info for finding SESE regions. */ + +struct bb_sese +{ + int node; /* Node number in spanning tree. */ + int parent; /* Parent node number. */ + + /* The algorithm splits each node A into Ai, A', Ao. The incoming + edges arrive at pseudo-node Ai and the outgoing edges leave at + pseudo-node Ao. We have to remember which way we arrived at a + particular node when generating the spanning tree. dir > 0 means + we arrived at Ai, dir < 0 means we arrived at Ao. */ + int dir; + + /* Lowest numbered pseudo-node reached via a backedge from thsis + node, or any descendant. */ + pseudo_node_t high; + + int color; /* Cycle-equivalence color */ + + /* Stack of brackets for this node. */ + bracket_vec_t brackets; + + bb_sese (unsigned node_, unsigned p, int dir_) + :node (node_), parent (p), dir (dir_) + { + } + ~bb_sese (); + + /* Push a bracket ending at BACK. */ + void push (const pseudo_node_t &back) + { + if (dump_file) + fprintf (dump_file, "Pushing backedge %d:%+d\n", + back.first ? back.first->index : 0, back.second); + brackets.safe_push (bracket (back)); + } + + void append (bb_sese *child); + void remove (const pseudo_node_t &); + + /* Set node's color. */ + void set_color (auto_vec &color_counts) + { + color = brackets.last ().get_color (color_counts, brackets.length ()); + } +}; + +bb_sese::~bb_sese () +{ +} + +/* Destructively append CHILD's brackets. */ + +void +bb_sese::append (bb_sese *child) +{ + if (int len = child->brackets.length ()) + { + int ix; + + if (dump_file) + { + for (ix = 0; ix < len; ix++) + { + const pseudo_node_t &pseudo = child->brackets[ix].back; + fprintf (dump_file, "Appending (%d)'s backedge %d:%+d\n", + child->node, pseudo.first ? pseudo.first->index : 0, + pseudo.second); + } + } + if (!brackets.length ()) + std::swap (brackets, child->brackets); + else + { + brackets.reserve (len); + for (ix = 0; ix < len; ix++) + brackets.quick_push (child->brackets[ix]); + } + } +} + +/* Remove brackets that terminate at PSEUDO. */ + +void +bb_sese::remove (const pseudo_node_t &pseudo) +{ + unsigned removed = 0; + int len = brackets.length (); + + for (int ix = 0; ix < len; ix++) + { + if (brackets[ix].back == pseudo) + { + if (dump_file) + fprintf (dump_file, "Removing backedge %d:%+d\n", + pseudo.first ? pseudo.first->index : 0, pseudo.second); + removed++; + } + else if (removed) + brackets[ix-removed] = brackets[ix]; + } + while (removed--) + brackets.pop (); +} + +/* Accessors for BB's aux pointer. */ +#define BB_SET_SESE(B, S) ((B)->aux = (S)) +#define BB_GET_SESE(B) ((bb_sese *)(B)->aux) + +/* DFS walk creating SESE data structures. Only cover nodes with + BB_VISITED set. Append discovered blocks to LIST. We number in + increments of 3 so that the above and below pseudo nodes can be + implicitly numbered too. */ + +static int +omp_sese_number (int n, int p, int dir, basic_block b, + auto_vec *list) +{ + if (BB_GET_SESE (b)) + return n; + + if (dump_file) + fprintf (dump_file, "Block %d(%d), parent (%d), orientation %+d\n", + b->index, n, p, dir); + + BB_SET_SESE (b, new bb_sese (n, p, dir)); + p = n; + + n += 3; + list->quick_push (b); + + /* First walk the nodes on the 'other side' of this node, then walk + the nodes on the same side. */ + for (unsigned ix = 2; ix; ix--) + { + vec *edges = dir > 0 ? b->succs : b->preds; + size_t offset = (dir > 0 ? offsetof (edge_def, dest) + : offsetof (edge_def, src)); + edge e; + edge_iterator (ei); + + FOR_EACH_EDGE (e, ei, edges) + { + basic_block target = *(basic_block *)((char *)e + offset); + + if (target->flags & BB_VISITED) + n = omp_sese_number (n, p, dir, target, list); + } + dir = -dir; + } + return n; +} + +/* Process pseudo node above (DIR < 0) or below (DIR > 0) ME. + EDGES are the outgoing edges and OFFSET is the offset to the src + or dst block on the edges. */ + +static void +omp_sese_pseudo (basic_block me, bb_sese *sese, int depth, int dir, + vec *edges, size_t offset) +{ + edge e; + edge_iterator (ei); + int hi_back = depth; + pseudo_node_t node_back (0, depth); + int hi_child = depth; + pseudo_node_t node_child (0, depth); + basic_block child = NULL; + unsigned num_children = 0; + int usd = -dir * sese->dir; + + if (dump_file) + fprintf (dump_file, "\nProcessing %d(%d) %+d\n", + me->index, sese->node, dir); + + if (dir < 0) + { + /* This is the above pseudo-child. It has the BB itself as an + additional child node. */ + node_child = sese->high; + hi_child = node_child.second; + if (node_child.first) + hi_child += BB_GET_SESE (node_child.first)->node; + num_children++; + } + + /* Examine each edge. + - if it is a child (a) append its bracket list and (b) record + whether it is the child with the highest reaching bracket. + - if it is an edge to ancestor, record whether it's the highest + reaching backlink. */ + FOR_EACH_EDGE (e, ei, edges) + { + basic_block target = *(basic_block *)((char *)e + offset); + + if (bb_sese *t_sese = BB_GET_SESE (target)) + { + if (t_sese->parent == sese->node && !(t_sese->dir + usd)) + { + /* Child node. Append its bracket list. */ + num_children++; + sese->append (t_sese); + + /* Compare it's hi value. */ + int t_hi = t_sese->high.second; + + if (basic_block child_hi_block = t_sese->high.first) + t_hi += BB_GET_SESE (child_hi_block)->node; + + if (hi_child > t_hi) + { + hi_child = t_hi; + node_child = t_sese->high; + child = target; + } + } + else if (t_sese->node < sese->node + dir + && !(dir < 0 && sese->parent == t_sese->node)) + { + /* Non-parental ancestor node -- a backlink. */ + int d = usd * t_sese->dir; + int back = t_sese->node + d; + + if (hi_back > back) + { + hi_back = back; + node_back = pseudo_node_t (target, d); + } + } + } + else + { /* Fallen off graph, backlink to entry node. */ + hi_back = 0; + node_back = pseudo_node_t (0, 0); + } + } + + /* Remove any brackets that terminate at this pseudo node. */ + sese->remove (pseudo_node_t (me, dir)); + + /* Now push any backlinks from this pseudo node. */ + FOR_EACH_EDGE (e, ei, edges) + { + basic_block target = *(basic_block *)((char *)e + offset); + if (bb_sese *t_sese = BB_GET_SESE (target)) + { + if (t_sese->node < sese->node + dir + && !(dir < 0 && sese->parent == t_sese->node)) + /* Non-parental ancestor node - backedge from me. */ + sese->push (pseudo_node_t (target, usd * t_sese->dir)); + } + else + { + /* back edge to entry node */ + sese->push (pseudo_node_t (0, 0)); + } + } + + /* If this node leads directly or indirectly to a no-return region of + the graph, then fake a backedge to entry node. */ + if (!sese->brackets.length () || !edges || !edges->length ()) + { + hi_back = 0; + node_back = pseudo_node_t (0, 0); + sese->push (node_back); + } + + /* Record the highest reaching backedge from us or a descendant. */ + sese->high = hi_back < hi_child ? node_back : node_child; + + if (num_children > 1) + { + /* There is more than one child -- this is a Y shaped piece of + spanning tree. We have to insert a fake backedge from this + node to the highest ancestor reached by not-the-highest + reaching child. Note that there may be multiple children + with backedges to the same highest node. That's ok and we + insert the edge to that highest node. */ + hi_child = depth; + if (dir < 0 && child) + { + node_child = sese->high; + hi_child = node_child.second; + if (node_child.first) + hi_child += BB_GET_SESE (node_child.first)->node; + } + + FOR_EACH_EDGE (e, ei, edges) + { + basic_block target = *(basic_block *)((char *)e + offset); + + if (target == child) + /* Ignore the highest child. */ + continue; + + bb_sese *t_sese = BB_GET_SESE (target); + if (!t_sese) + continue; + if (t_sese->parent != sese->node) + /* Not a child. */ + continue; + + /* Compare its hi value. */ + int t_hi = t_sese->high.second; + + if (basic_block child_hi_block = t_sese->high.first) + t_hi += BB_GET_SESE (child_hi_block)->node; + + if (hi_child > t_hi) + { + hi_child = t_hi; + node_child = t_sese->high; + } + } + + sese->push (node_child); + } +} + + +/* DFS walk of BB graph. Color node BLOCK according to COLORING then + proceed to successors. Set SESE entry and exit nodes of + REGIONS. */ + +static void +omp_sese_color (auto_vec &color_counts, bb_pair_vec_t ®ions, + basic_block block, int coloring) +{ + bb_sese *sese = BB_GET_SESE (block); + + if (block->flags & BB_VISITED) + { + /* If we've already encountered this block, either we must not + be coloring, or it must have been colored the current color. */ + gcc_assert (coloring < 0 || (sese && coloring == sese->color)); + return; + } + + block->flags |= BB_VISITED; + + if (sese) + { + if (coloring < 0) + { + /* Start coloring a region. */ + regions[sese->color].first = block; + coloring = sese->color; + } + + if (!--color_counts[sese->color] && sese->color == coloring) + { + /* Found final block of SESE region. */ + regions[sese->color].second = block; + coloring = -1; + } + else + /* Color the node, so we can assert on revisiting the node + that the graph is indeed SESE. */ + sese->color = coloring; + } + else + /* Fallen off the subgraph, we cannot be coloring. */ + gcc_assert (coloring < 0); + + /* Walk each successor block. */ + if (block->succs && block->succs->length ()) + { + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, block->succs) + omp_sese_color (color_counts, regions, e->dest, coloring); + } + else + gcc_assert (coloring < 0); +} + +/* Find minimal set of SESE regions covering BLOCKS. REGIONS might + end up with NULL entries in it. */ + +static void +omp_find_sese (auto_vec &blocks, bb_pair_vec_t ®ions) +{ + basic_block block; + int ix; + + /* First clear each BB of the whole function. */ + FOR_EACH_BB_FN (block, cfun) + { + block->flags &= ~BB_VISITED; + BB_SET_SESE (block, 0); + } + block = EXIT_BLOCK_PTR_FOR_FN (cfun); + block->flags &= ~BB_VISITED; + BB_SET_SESE (block, 0); + block = ENTRY_BLOCK_PTR_FOR_FN (cfun); + block->flags &= ~BB_VISITED; + BB_SET_SESE (block, 0); + + /* Mark blocks in the function that are in this graph. */ + for (ix = 0; blocks.iterate (ix, &block); ix++) + block->flags |= BB_VISITED; + + /* Counts of nodes assigned to each color. There cannot be more + colors than blocks (and hopefully there will be fewer). */ + auto_vec color_counts; + color_counts.reserve (blocks.length ()); + + /* Worklist of nodes in the spanning tree. Again, there cannot be + more nodes in the tree than blocks (there will be fewer if the + CFG of blocks is disjoint). */ + auto_vec spanlist; + spanlist.reserve (blocks.length ()); + + /* Make sure every block has its cycle class determined. */ + for (ix = 0; blocks.iterate (ix, &block); ix++) + { + if (BB_GET_SESE (block)) + /* We already met this block in an earlier graph solve. */ + continue; + + if (dump_file) + fprintf (dump_file, "Searching graph starting at %d\n", block->index); + + /* Number the nodes reachable from block initial DFS order. */ + int depth = omp_sese_number (2, 0, +1, block, &spanlist); + + /* Now walk in reverse DFS order to find cycle equivalents. */ + while (spanlist.length ()) + { + block = spanlist.pop (); + bb_sese *sese = BB_GET_SESE (block); + + /* Do the pseudo node below. */ + omp_sese_pseudo (block, sese, depth, +1, + sese->dir > 0 ? block->succs : block->preds, + (sese->dir > 0 ? offsetof (edge_def, dest) + : offsetof (edge_def, src))); + sese->set_color (color_counts); + /* Do the pseudo node above. */ + omp_sese_pseudo (block, sese, depth, -1, + sese->dir < 0 ? block->succs : block->preds, + (sese->dir < 0 ? offsetof (edge_def, dest) + : offsetof (edge_def, src))); + } + if (dump_file) + fprintf (dump_file, "\n"); + } + + if (dump_file) + { + unsigned count; + const char *comma = ""; + + fprintf (dump_file, "Found %d cycle equivalents\n", + color_counts.length ()); + for (ix = 0; color_counts.iterate (ix, &count); ix++) + { + fprintf (dump_file, "%s%d[%d]={", comma, ix, count); + + comma = ""; + for (unsigned jx = 0; blocks.iterate (jx, &block); jx++) + if (BB_GET_SESE (block)->color == ix) + { + block->flags |= BB_VISITED; + fprintf (dump_file, "%s%d", comma, block->index); + comma=","; + } + fprintf (dump_file, "}"); + comma = ", "; + } + fprintf (dump_file, "\n"); + } + + /* Now we've colored every block in the subgraph. We now need to + determine the minimal set of SESE regions that cover that + subgraph. Do this with a DFS walk of the complete function. + During the walk we're either 'looking' or 'coloring'. When we + reach the last node of a particular color, we stop coloring and + return to looking. */ + + /* There cannot be more SESE regions than colors. */ + regions.reserve (color_counts.length ()); + for (ix = color_counts.length (); ix--;) + regions.quick_push (bb_pair_t (0, 0)); + + for (ix = 0; blocks.iterate (ix, &block); ix++) + block->flags &= ~BB_VISITED; + + omp_sese_color (color_counts, regions, ENTRY_BLOCK_PTR_FOR_FN (cfun), -1); + + if (dump_file) + { + const char *comma = ""; + int len = regions.length (); + + fprintf (dump_file, "SESE regions:"); + for (ix = 0; ix != len; ix++) + { + basic_block from = regions[ix].first; + basic_block to = regions[ix].second; + + if (from) + { + fprintf (dump_file, "%s %d{%d", comma, ix, from->index); + if (to != from) + fprintf (dump_file, "->%d", to->index); + + int color = BB_GET_SESE (from)->color; + + /* Print the blocks within the region (excluding ends). */ + FOR_EACH_BB_FN (block, cfun) + { + bb_sese *sese = BB_GET_SESE (block); + + if (sese && sese->color == color + && block != from && block != to) + fprintf (dump_file, ".%d", block->index); + } + fprintf (dump_file, "}"); + } + comma = ","; + } + fprintf (dump_file, "\n\n"); + } + + for (ix = 0; blocks.iterate (ix, &block); ix++) + delete BB_GET_SESE (block); +} + +#undef BB_SET_SESE +#undef BB_GET_SESE diff --git a/gcc/omp-sese.h b/gcc/omp-sese.h new file mode 100644 index 00000000000..57290eb9d65 --- /dev/null +++ b/gcc/omp-sese.h @@ -0,0 +1,26 @@ +/* Find single-entry, single-exit regions for OpenACC. + + Copyright (C) 2005-2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_OMP_SESE_H +#define GCC_OMP_SESE_H + +extern void oacc_do_neutering (void); + +#endif diff --git a/gcc/passes.def b/gcc/passes.def index f4c4b96d96d..e9c13bf7768 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -178,6 +178,8 @@ along with GCC; see the file COPYING3. If not see INSERT_PASSES_AFTER (all_passes) NEXT_PASS (pass_fixup_cfg); NEXT_PASS (pass_lower_eh_dispatch); + NEXT_PASS (pass_oacc_loop_designation); + NEXT_PASS (pass_oacc_gimple_workers); NEXT_PASS (pass_oacc_device_lower); NEXT_PASS (pass_omp_device_lower); NEXT_PASS (pass_omp_target_link); diff --git a/gcc/target.def b/gcc/target.def index d82db232e40..c9c3f650e8a 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1735,6 +1735,14 @@ DEFHOOK void, (tree var), NULL) +DEFHOOK +(create_propagation_record, +"Create a record used to propagate local-variable state from an active\n\ +worker to other workers. A possible implementation might adjust the type\n\ +of REC to place the new variable in shared GPU memory.", +tree, (tree rec, bool sender, const char *name), +default_goacc_create_propagation_record) + DEFHOOK (explode_args, "Define this hook to TRUE if arguments to offload regions should be\n\ @@ -1742,6 +1750,11 @@ exploded, i.e. passed as true arguments rather than in an argument array.", bool, (void), hook_bool_void_false) +DEFHOOKPOD +(worker_partitioning, +"Use gimple transformation for worker neutering/broadcasting.", +bool, false) + HOOK_VECTOR_END (goacc) /* Functions relating to vectorization. */ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 59436278dcf..29b0258f49f 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -125,6 +125,7 @@ extern bool default_goacc_validate_dims (tree, int [], int, unsigned); extern int default_goacc_dim_limit (int); extern bool default_goacc_fork_join (gcall *, const int [], bool); extern void default_goacc_reduction (gcall *); +extern tree default_goacc_create_propagation_record (tree, bool, const char *); /* These are here, and not in hooks.[ch], because not all users of hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS. */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index f00c9199452..d1a1d3a7d4a 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -417,6 +417,8 @@ extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_oacc_loop_designation (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_oacc_gimple_workers (gcc::context *ctxt); extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt); extern gimple_opt_pass *make_pass_omp_device_lower (gcc::context *ctxt); extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt); From patchwork Thu Sep 5 01:45:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1158171 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508339-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Boyb7k3r"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46P3Sy74Dlz9s4Y for ; Thu, 5 Sep 2019 11:46:42 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=RSoGrYrj83Mw/H7hqND+aFPUguiVQ45avq4lbI5GZ0Fxluiz13osq qVrJizBdPScEx4PSveQKCpAtzIkXVFf5/14m9BZM+tC/bLn7DFZGgtiCZLrsL50Z o5O5aI8gNfhGk6U/QikViCr2dGk1CmHu2VrgTL6B2Sd8f9EDCU16D8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=xbU1kXMHYEdSWnl4EltSPZQy/mk=; b=Boyb7k3rFamargJu7HkglWdtTgWL Zdd2Q44lybTyQ7qisF/ho7t9g1gh6xgx50dvgx0jRfRnqBelKML2JDazFSII0j0o 9q9BRsCGTeyOZ3xEDOPHenSQyIPjhteerKrQc3W7NsJdDoCjM/AuznXazDhAOrgj VrM5BP6aVZDuxag= Received: (qmail 2756 invoked by alias); 5 Sep 2019 01:46:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2541 invoked by uid 89); 5 Sep 2019 01:46:19 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 01:46:17 +0000 IronPort-SDR: ngpwly0vG6NQH20zhyO5hCr/GgM5XzoIEB+02mIjdp0J7YN180qvBc2fGoFYyaLPa515srbOiA Rog2HeYSyBOgJuDV7R3OakazC9H1/3yPYWMw+WQlbd1TUFcHFOvDECAVBxdd5GpvsVjSS1hmX8 yShQpb1B8wz80B14UEO9xwC5BJt9azlfC8M5EsbwGEwv27UeKKSwvXSfVHRaELdMXbu5J8C5G3 ZpztdOv05tBDDpt89ZE1hLUeFpj6iddiBOoz7Y29uLTWIkxRcxPPKT4kRXci+6oTI4W5qq9K2S 7gA= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 04 Sep 2019 17:46:15 -0800 IronPort-SDR: eNTY2Tyx9vr+IBm9vYM8/llXZBl6SE1WqYOx2y6fFQ+tjTvR74pgv47TuB4jQGCBO1+RsRKGgC niGTbUWYxw5914jM3oN15vC5xMcW0N0m0dVHkY+VBkS1XAgdZCmu7xjaKKf8IlxyzowtvpBJjo A2Bqv5xDvwi6bKrkshePTmMxmw4ZQR0DxZrY0c80VAa8XPj9IQJHJEROxbcLEfpwROj0GURyNq qk+smwDVpKcJQ/d0OVaAjPY3vmSGNzhvsDKnrYCJQAI5RzCQ6bgOBk8smxH0OGWN01Twmvu08E /3s= From: Julian Brown To: CC: Andrew Stubbs Subject: [PATCH 3/6] [og9] AMD GCN adjustments for middle-end worker partitioning Date: Wed, 4 Sep 2019 18:45:52 -0700 Message-ID: In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch renames the TARGET_GOACC_ADJUST_PROPAGATION_RECORD hook introduced in the GCN backend by a previous merge to TARGET_GOACC_CREATE_PROPAGATION_RECORD, and removes a FIXME relating to missing worker-partitioning support. Julian ChangeLog gcc/ * config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename prototype to... (gcn_goacc_create_propagation_record): This. * config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename function to... (gcn_goacc_create_propagation_record): This. Adjust comment. * config/gcn/gcn.c (gcn_init_builtins): Override decls for BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER. (gcn_fork_join): Remove inaccurate comment. (TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to... (TARGET_GOACC_CREATE_PROPAGATION_RECORD): This. --- gcc/ChangeLog.openacc | 15 +++++++++++++++ gcc/config/gcn/gcn-protos.h | 2 +- gcc/config/gcn/gcn-tree.c | 6 +++--- gcc/config/gcn/gcn.c | 11 +++-------- 4 files changed, 22 insertions(+), 12 deletions(-) diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index a2b2dcfcf26..0d068ac8ae2 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,18 @@ +2019-09-05 Julian Brown + + * config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename + prototype to... + (gcn_goacc_create_propagation_record): This. + * config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename + function to... + (gcn_goacc_create_propagation_record): This. Adjust comment. + * config/gcn/gcn.c (gcn_init_builtins): Override decls for + BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START, + BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER. + (gcn_fork_join): Remove inaccurate comment. + (TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to... + (TARGET_GOACC_CREATE_PROPAGATION_RECORD): This. + 2019-09-05 Julian Brown * Makefile.in (OBJS): Add omp-sese.o. diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h index da7faf29c70..1711862c6a2 100644 --- a/gcc/config/gcn/gcn-protos.h +++ b/gcc/config/gcn/gcn-protos.h @@ -37,7 +37,7 @@ extern rtx gcn_full_exec (); extern rtx gcn_full_exec_reg (); extern rtx gcn_gen_undef (machine_mode); extern bool gcn_global_address_p (rtx); -extern tree gcn_goacc_adjust_propagation_record (tree record_type, bool sender, +extern tree gcn_goacc_create_propagation_record (tree record_type, bool sender, const char *name); extern void gcn_goacc_adjust_gangprivate_decl (tree var); extern void gcn_goacc_reduction (gcall *call); diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c index c6b6302e9ed..04902a39b29 100644 --- a/gcc/config/gcn/gcn-tree.c +++ b/gcc/config/gcn/gcn-tree.c @@ -667,12 +667,12 @@ gcn_goacc_reduction (gcall *call) } } -/* Implement TARGET_GOACC_ADJUST_PROPAGATION_RECORD. +/* Implement TARGET_GOACC_CREATE_PROPAGATION_RECORD. - Tweak (worker) propagation record, e.g. to put it in shared memory. */ + Create (worker) propagation record in shared memory. */ tree -gcn_goacc_adjust_propagation_record (tree record_type, bool sender, +gcn_goacc_create_propagation_record (tree record_type, bool sender, const char *name) { tree type = record_type; diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index f3f112d95a9..ca9321b5f25 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -3468,8 +3468,6 @@ gcn_init_builtins (void) TREE_NOTHROW (gcn_builtin_decls[i]) = 1; } -/* FIXME: remove the ifdef once OpenACC support is merged upstream. */ -#ifdef BUILT_IN_GOACC_SINGLE_START /* These builtins need to take/return an LDS pointer: override the generic versions here. */ @@ -3486,7 +3484,6 @@ gcn_init_builtins (void) set_builtin_decl (BUILT_IN_GOACC_BARRIER, gcn_builtin_decls[GCN_BUILTIN_ACC_BARRIER], false); -#endif } /* Expand the CMP_SWAP GCN builtins. We have our own versions that do @@ -4765,8 +4762,6 @@ static bool gcn_fork_join (gcall *ARG_UNUSED (call), const int *ARG_UNUSED (dims), bool ARG_UNUSED (is_fork)) { - /* GCN does not use the fork/join concept invented for NVPTX. - Instead we use standard autovectorization. */ return false; } @@ -6029,9 +6024,9 @@ print_operand (FILE *file, rtx x, int code) #define TARGET_FUNCTION_VALUE_REGNO_P gcn_function_value_regno_p #undef TARGET_GIMPLIFY_VA_ARG_EXPR #define TARGET_GIMPLIFY_VA_ARG_EXPR gcn_gimplify_va_arg_expr -#undef TARGET_GOACC_ADJUST_PROPAGATION_RECORD -#define TARGET_GOACC_ADJUST_PROPAGATION_RECORD \ - gcn_goacc_adjust_propagation_record +#undef TARGET_GOACC_CREATE_PROPAGATION_RECORD +#define TARGET_GOACC_CREATE_PROPAGATION_RECORD \ + gcn_goacc_create_propagation_record #undef TARGET_GOACC_ADJUST_GANGPRIVATE_DECL #define TARGET_GOACC_ADJUST_GANGPRIVATE_DECL gcn_goacc_adjust_gangprivate_decl #undef TARGET_GOACC_FORK_JOIN From patchwork Thu Sep 5 01:45:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1158172 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508340-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="USfTJkhM"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46P3TB4rH5z9s4Y for ; Thu, 5 Sep 2019 11:46:54 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=H/DfGazM5ZjTauQx1FFQpXW+phlBhuFibaYUAOquPX8iTv8+hN7XF efqhZNWEyp876NsC7hZn++IQ6CyyaRTJMGMozkwb78jftJ56Au7wv+UZF0grMA5Z L2v497gZH8tpDJ9b4aNwvypgeE4tLVOeSGT1GMrf6uVfmSBj/yeiBM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=452LiD+1p3rlh0chvJO7wwk3Pp4=; b=USfTJkhMXs7e0LYqsT7owbsRjQa0 Mvb8dhTbLEz4eQoEAyVFrB5PX/dkwi6BS6i8GAIJmf50mcF2gUawiCFWVhpJz82N 16maesOe1OKdDXeSDWDzMo37obs5i01+jR6H5LIZymTNCHBQcRgtm1PgR6LTIKyp 8px/C4ZLDC367Dg= Received: (qmail 3174 invoked by alias); 5 Sep 2019 01:46:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3103 invoked by uid 89); 5 Sep 2019 01:46:24 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy= X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 01:46:20 +0000 IronPort-SDR: OqvvLm426QMobueUI86AarVoY+t6w+Yd16hCcgOPtgeueftSKA/fjOeFbkBXWduRv1MtDZ5a8q vaZ2zl76MZ/wvcPyR6d+MocU0HvdltWZyntDnsLj6wednx8PnMZnDjBPf4/8S9dS/oOe6uvSUQ I410VNKfodQi2FyACTz6+NyogMDstfCrC6f2Mt20jiZxOTgjraWHcHABYJHQJRBMliVkJudgMs erWrmkgrHliUyDsDYRGcNIc5Lo+mVx+ldXp8Y2KrHGXAbbqwxyedROAFyWudf3OaHrmp17WvZ0 hEg= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 04 Sep 2019 17:46:18 -0800 IronPort-SDR: uBYmdBHGC6o7/eTGR0oD+qkmjkACzwlX6rXmz6SxNpSrX11ScQaF9KC8riJ7o5erKzVg5V1P4M clU+Zx1pvSlbj0vu2Gk0rvVhlvynFAC7Z2JnlPICXrDpS3598wCqZG9dV9VhTiyV2Smvv7/N7b 5NDmjvzUpBhjWpGifsCESaLK6YKA7i76TuMcMIIGGJoF/ScHGGlMTaOP8EjWxPhD8VgAT0yeS/ r7OMOorO4h30IXFQb5/q25RAYSFhMhnOfRLUyj2bf2jYsaZ0sT2Cjq+vJmdRAza8VQNEh9lr2K uyY= From: Julian Brown To: CC: Andrew Stubbs Subject: [PATCH 4/6] [og9] Fix up tests for oaccdevlow pass splitting Date: Wed, 4 Sep 2019 18:45:53 -0700 Message-ID: <40d6dc794b87eb2e51e294a11b83194fbbb02b8b.1567644180.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch adjusts some tests after the splitting of the oaccdevlow pass into three passes. Julian ChangeLog gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c, c-c++-common/goacc/classify-kernels.c, c-c++-common/goacc/classify-parallel.c, c-c++-common/goacc/classify-routine.c, gfortran.dg/goacc/classify-kernels-unparallelized.f95, gfortran.dg/goacc/classify-kernels.f95, gfortran.dg/goacc/classify-parallel.f95, gfortran.dg/goacc/classify-routine.f95: Scan oaccloops dump instead of oaccdevlow pass. --- gcc/testsuite/ChangeLog.openacc | 12 ++++++++++++ .../goacc/classify-kernels-unparallelized.c | 8 ++++---- gcc/testsuite/c-c++-common/goacc/classify-kernels.c | 8 ++++---- gcc/testsuite/c-c++-common/goacc/classify-parallel.c | 8 ++++---- gcc/testsuite/c-c++-common/goacc/classify-routine.c | 8 ++++---- .../goacc/classify-kernels-unparallelized.f95 | 8 ++++---- gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 | 8 ++++---- .../gfortran.dg/goacc/classify-parallel.f95 | 8 ++++---- gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 | 8 ++++---- 9 files changed, 44 insertions(+), 32 deletions(-) diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc index 8295fe61ba7..899b9cf1783 100644 --- a/gcc/testsuite/ChangeLog.openacc +++ b/gcc/testsuite/ChangeLog.openacc @@ -1,3 +1,15 @@ +2019-09-05 Julian Brown + + * c-c++-common/goacc/classify-kernels-unparallelized.c, + c-c++-common/goacc/classify-kernels.c, + c-c++-common/goacc/classify-parallel.c, + c-c++-common/goacc/classify-routine.c, + gfortran.dg/goacc/classify-kernels-unparallelized.f95, + gfortran.dg/goacc/classify-kernels.f95, + gfortran.dg/goacc/classify-parallel.f95, + gfortran.dg/goacc/classify-routine.f95: Scan oaccloops dump instead of + oaccdevlow pass. + 2019-07-10 Julian Brown * c-c++-common/goacc/mdc-1.c: Update clause matching patterns. diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c index 9dad2de504c..f05fba9d31b 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c @@ -5,7 +5,7 @@ { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } { dg-additional-options "-fdump-tree-parloops1-all" } - { dg-additional-options "-fdump-tree-oaccdevlow" } */ + { dg-additional-options "-fdump-tree-oaccloops" } */ #define N 1024 @@ -36,6 +36,6 @@ void KERNELS () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */ + { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c index f1d46130685..009db79b018 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c @@ -5,7 +5,7 @@ { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } { dg-additional-options "-fdump-tree-parloops1-all" } - { dg-additional-options "-fdump-tree-oaccdevlow" } */ + { dg-additional-options "-fdump-tree-oaccloops" } */ #define N 1024 @@ -31,6 +31,6 @@ void KERNELS () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */ + { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c index 9c80efd65e1..fa2e476458e 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c @@ -4,7 +4,7 @@ /* { dg-additional-options "-O2" } { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } - { dg-additional-options "-fdump-tree-oaccdevlow" } */ + { dg-additional-options "-fdump-tree-oaccloops" } */ #define N 1024 @@ -24,6 +24,6 @@ void PARALLEL () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */ + { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccloops" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/classify-routine.c b/gcc/testsuite/c-c++-common/goacc/classify-routine.c index a4994b061f0..af0069a6f78 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-routine.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-routine.c @@ -4,7 +4,7 @@ /* { dg-additional-options "-O2" } { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } - { dg-additional-options "-fdump-tree-oaccdevlow" } */ + { dg-additional-options "-fdump-tree-oaccloops" } */ #define N 1024 @@ -26,6 +26,6 @@ void ROUTINE () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccdevlow" } } */ + { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops" } } */ diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 index 08772428c4c..6e4001b4f9b 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 @@ -5,7 +5,7 @@ ! { dg-additional-options "-fopt-info-optimized-omp" } ! { dg-additional-options "-fdump-tree-ompexp" } ! { dg-additional-options "-fdump-tree-parloops1-all" } -! { dg-additional-options "-fdump-tree-oaccdevlow" } +! { dg-additional-options "-fdump-tree-oaccloops" } program main implicit none @@ -37,6 +37,6 @@ end program main ! Check the offloaded function's classification and compute dimensions (will ! always be 1 x 1 x 1 for non-offloading compilation). -! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } +! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 index f2c4736e111..a0a5fd93bbc 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 @@ -5,7 +5,7 @@ ! { dg-additional-options "-fopt-info-optimized-omp" } ! { dg-additional-options "-fdump-tree-ompexp" } ! { dg-additional-options "-fdump-tree-parloops1-all" } -! { dg-additional-options "-fdump-tree-oaccdevlow" } +! { dg-additional-options "-fdump-tree-oaccloops" } program main implicit none @@ -33,6 +33,6 @@ end program main ! Check the offloaded function's classification and compute dimensions (will ! always be 1 x 1 x 1 for non-offloading compilation). -! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } +! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95 index a23ea81609b..ae3f322fb63 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95 @@ -4,7 +4,7 @@ ! { dg-additional-options "-O2" } ! { dg-additional-options "-fopt-info-optimized-omp" } ! { dg-additional-options "-fdump-tree-ompexp" } -! { dg-additional-options "-fdump-tree-oaccdevlow" } +! { dg-additional-options "-fdump-tree-oaccloops" } program main implicit none @@ -26,6 +26,6 @@ end program main ! Check the offloaded function's classification and compute dimensions (will ! always be 1 x 1 x 1 for non-offloading compilation). -! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccdevlow" } } +! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccloops" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 index e435f5d7eae..272fb713ace 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 @@ -4,7 +4,7 @@ ! { dg-additional-options "-O2" } ! { dg-additional-options "-fopt-info-optimized-omp" } ! { dg-additional-options "-fdump-tree-ompexp" } -! { dg-additional-options "-fdump-tree-oaccdevlow" } +! { dg-additional-options "-fdump-tree-oaccloops" } subroutine ROUTINE !$acc routine worker @@ -25,6 +25,6 @@ end subroutine ROUTINE ! Check the offloaded function's classification and compute dimensions (will ! always be 1 x 1 x 1 for non-offloading compilation). -! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } } -! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target, oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccdevlow" } } +! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } +! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target, oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops" } } From patchwork Thu Sep 5 01:46:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1158176 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508343-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="SzTOErqj"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46P3VK0kx2z9s4Y for ; Thu, 5 Sep 2019 11:47:52 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=fo3VaPl8XnduCsZ5+KDw4FpCjR8omghLFgYRneqG67t1Lma8CIA7b 4O7jNKILgiFm0AmeZ8j4+wQIAg+JdiqF9tvIGcm25KVLZUsgKYf3aAme4TtqR7OQ pbVaPOQ1NiXaGUfiAi8A6sVoXHg68hvmnku/ltvtkQdv8Io1Snj+FA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=6Ue9ASvjV+gqck4FujCQRda+dtk=; b=SzTOErqjMcCxoeapmX8ZpiSQ2k+n XwaPVdNQW9CpxpsC3wY/QKZ/uL6V9XEIvz5cQCkGohtSa8wDCr5lzBNP5cYSSiO/ 6aDpsCb6dYfbk3+tILwi7f6Hj2iLlaE48cjyfxVcU2bo6gYQbqxxFVuvQ4Q4PYjI dn2+VTtbpq3X1c8= Received: (qmail 8748 invoked by alias); 5 Sep 2019 01:47:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8567 invoked by uid 89); 5 Sep 2019 01:47:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy= X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 01:47:12 +0000 IronPort-SDR: 1ONYaEmT2O7pBK67f4zjrXReqVOA69/g0+VBl+iNtd8MpBuB3b+/w6SWIxOZPh2aPJSVYPSaSI kMnmAf5dkUEafjz30hDSzeVsFSdqgC5v7zI9bEg8MbIxH7zkJPKPOMcISZeAJk1wYtLfUXEh9D 7aq0tZtlvcUHnLVKzzS4JSMjpZdUn2XF0mZa8mQOQZ4pjZpnqy8yKGtdBqvq5o52J4nYcda+aB e1O55+Tter4re57FPEWw4YNV8GmuRr9UnlLkkJzrGPA1dNVFZoC5T+ap5f8ZFyWmZBJ/oaXbnN UfU= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 04 Sep 2019 17:47:10 -0800 IronPort-SDR: TX3aiPwsfmUoDWtqcBs9TckwlknUkijoubEf14ZzNhTc610O2FP1UQUftJCS/jYEMgHKn7EvLD VnigPHWyGZf0DreiF0lcWWDYy4nb68D1Jo2302+rXZBAXR6r7Bl+ZvCOjmv+Uvdhj46XzeQMFu kcD9RXNArMc6cTGuM0eKYNneX52FrnWNTUeGlBAwwNcUcDvACnLSc6wbALsUHLCsyMeEdGQU/X JFr4cDohbPLXixpMgRUO+ZmutJyVzEOY4JGGWlm0OoTMYveFFjPIsMT6kL0UPRe7qbmgLRn1/n 4iI= From: Julian Brown To: CC: Andrew Stubbs Subject: [PATCH 5/6] [og9] Reference reduction localization Date: Wed, 4 Sep 2019 18:46:57 -0700 Message-ID: In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This is a version of Cesar's patch to rewrite OpenACC reference reductions (e.g. Fortran function arguments) to use temporary scalar variables. This is necessary at present to avoid stack-slot clashes between multiple workers on AMD GCN. Julian ChangeLog gcc/ * gimplify.c (privatize_reduction): New struct. (localize_reductions_r, localize_reductions): New functions. (gimplify_omp_for): Call localize_reductions. (gimplify_omp_workshare): Likewise. * omp-low.c (lower_oacc_reductions): Handle localized reductions. Create fewer temp vars. * tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL documentation. * tree.c (omp_clause_num_ops): Bump number of ops for OMP_CLAUSE_REDUCTION to 6. (walk_tree_1): Adjust accordingly. * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro. --- gcc/ChangeLog.openacc | 16 +++++++ gcc/gimplify.c | 102 ++++++++++++++++++++++++++++++++++++++++++ gcc/omp-low.c | 47 ++++++------------- gcc/tree-core.h | 4 +- gcc/tree.c | 11 +++-- gcc/tree.h | 2 + 6 files changed, 145 insertions(+), 37 deletions(-) diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index 0d068ac8ae2..2b7f616810d 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,19 @@ +2019-09-05 Cesar Philippidis + Julian Brown + + * gimplify.c (privatize_reduction): New struct. + (localize_reductions_r, localize_reductions): New functions. + (gimplify_omp_for): Call localize_reductions. + (gimplify_omp_workshare): Likewise. + * omp-low.c (lower_oacc_reductions): Handle localized reductions. + Create fewer temp vars. + * tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL + documentation. + * tree.c (omp_clause_num_ops): Bump number of ops for + OMP_CLAUSE_REDUCTION to 6. + (walk_tree_1): Adjust accordingly. + * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro. + 2019-09-05 Julian Brown * config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 58142c9eb90..685db1763e0 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -234,6 +234,11 @@ struct gimplify_omp_ctx hash_map *decl_data_clause; }; +struct privatize_reduction +{ + tree ref_var, local_var; +}; + static struct gimplify_ctx *gimplify_ctxp; static struct gimplify_omp_ctx *gimplify_omp_ctxp; @@ -10804,6 +10809,80 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, void *data) return NULL_TREE; } +/* Helper function for localize_reductions. Replace all uses of REF_VAR with + LOCAL_VAR. */ + +static tree +localize_reductions_r (tree *tp, int *walk_subtrees, void *data) +{ + enum tree_code tc = TREE_CODE (*tp); + struct privatize_reduction *pr = (struct privatize_reduction *) data; + + if (TYPE_P (*tp)) + *walk_subtrees = 0; + + switch (tc) + { + case INDIRECT_REF: + case MEM_REF: + if (TREE_OPERAND (*tp, 0) == pr->ref_var) + *tp = pr->local_var; + + *walk_subtrees = 0; + break; + + case VAR_DECL: + case PARM_DECL: + case RESULT_DECL: + if (*tp == pr->ref_var) + *tp = pr->local_var; + + *walk_subtrees = 0; + break; + + default: + break; + } + + return NULL_TREE; +} + +/* OpenACC worker and vector loop state propagation requires reductions + to be inside local variables. This function replaces all reference-type + reductions variables associated with the loop with a local copy. It is + also used to create private copies of reduction variables for those + which are not associated with acc loops. */ + +static void +localize_reductions (tree clauses, tree body) +{ + tree c, var, type, new_var; + struct privatize_reduction pr; + + for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION) + { + var = OMP_CLAUSE_DECL (c); + + if (!lang_hooks.decls.omp_privatize_by_reference (var)) + { + OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = NULL; + continue; + } + + type = TREE_TYPE (TREE_TYPE (var)); + new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var))); + + pr.ref_var = var; + pr.local_var = new_var; + + walk_tree (&body, localize_reductions_r, &pr, NULL); + + OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var; + } +} + + /* Gimplify the gross structure of an OMP_FOR statement. */ static enum gimplify_status @@ -10989,6 +11068,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) gcc_unreachable (); } + if (ort == ORT_ACC) + { + gimplify_omp_ctx *outer = gimplify_omp_ctxp; + + while (outer + && outer->region_type != ORT_ACC_PARALLEL + && outer->region_type != ORT_ACC_KERNELS) + outer = outer->outer_context; + + /* FIXME: Reductions only work in parallel regions at present. We avoid + doing the reduction localization transformation in kernels regions + here, because the code to remove reductions in kernels regions cannot + handle that. */ + if (outer && outer->region_type == ORT_ACC_PARALLEL) + localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p)); + } + /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear clause for the IV. */ if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1) @@ -12154,6 +12250,12 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) || (ort & ORT_HOST_TEAMS) == ORT_HOST_TEAMS) { push_gimplify_context (); + + /* FIXME: Reductions are not supported in kernels regions yet. */ + if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL) + localize_reductions (OMP_TARGET_CLAUSES (*expr_p), + OMP_TARGET_BODY (*expr_p)); + gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body); if (gimple_code (g) == GIMPLE_BIND) pop_gimplify_context (g); diff --git a/gcc/omp-low.c b/gcc/omp-low.c index fe911599142..4b21769a9a7 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -6129,9 +6129,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION) { tree orig = OMP_CLAUSE_DECL (c); - tree var = maybe_lookup_decl (orig, ctx); + tree var; tree ref_to_res = NULL_TREE; - tree incoming, outgoing, v1, v2, v3; + tree incoming, outgoing; bool is_private = false; bool is_fpp = false; @@ -6144,6 +6144,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, rcode = BIT_IOR_EXPR; tree op = build_int_cst (unsigned_type_node, rcode); + var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c); + if (!var) + var = maybe_lookup_decl (orig, ctx); if (!var) var = orig; @@ -6255,36 +6258,13 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, if (!ref_to_res) ref_to_res = integer_zero_node; - if (omp_is_reference (orig)) + if (omp_is_reference (outgoing)) { - tree type = TREE_TYPE (var); - const char *id = IDENTIFIER_POINTER (DECL_NAME (var)); - - if (!inner) - { - tree x = create_tmp_var (TREE_TYPE (type), id); - gimplify_assign (var, build_fold_addr_expr (x), fork_seq); - } - - v1 = create_tmp_var (type, id); - v2 = create_tmp_var (type, id); - v3 = create_tmp_var (type, id); - - gimplify_assign (v1, var, fork_seq); - gimplify_assign (v2, var, fork_seq); - gimplify_assign (v3, var, fork_seq); - - var = build_simple_mem_ref (var); - v1 = build_simple_mem_ref (v1); - v2 = build_simple_mem_ref (v2); - v3 = build_simple_mem_ref (v3); outgoing = build_simple_mem_ref (outgoing); if (!TREE_CONSTANT (incoming)) incoming = build_simple_mem_ref (incoming); } - else - v1 = v2 = v3 = var; /* Determine position in reduction buffer, which may be used by target. The parser has ensured that this is not a @@ -6317,20 +6297,21 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION, TREE_TYPE (var), 6, init_code, unshare_expr (ref_to_res), - v1, level, op, off); + var, level, op, off); tree fini_call = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION, TREE_TYPE (var), 6, fini_code, unshare_expr (ref_to_res), - v2, level, op, off); + var, level, op, off); tree teardown_call = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION, - TREE_TYPE (var), 6, teardown_code, - ref_to_res, v3, level, op, off); + TREE_TYPE (var), 6, + teardown_code, ref_to_res, var, + level, op, off); - gimplify_assign (v1, setup_call, &before_fork); - gimplify_assign (v2, init_call, &after_fork); - gimplify_assign (v3, fini_call, &before_join); + gimplify_assign (var, setup_call, &before_fork); + gimplify_assign (var, init_call, &after_fork); + gimplify_assign (var, fini_call, &before_join); gimplify_assign (outgoing, teardown_call, &after_join); } diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 5bcecc160bc..94a2582d49c 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -255,7 +255,9 @@ enum omp_clause_code { placeholder used in OMP_CLAUSE_REDUCTION_{INIT,MERGE}. Operand 4: OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER: Another dummy VAR_DECL placeholder, used like the above for C/C++ array - reductions. */ + reductions. + Operand 5: OMP_CLAUSE_REDUCTION_PRIVATE_DECL: A private VAR_DECL of + the original DECL associated with the reduction clause. */ OMP_CLAUSE_REDUCTION, /* OpenMP clause: task_reduction (operator:variable_list). */ diff --git a/gcc/tree.c b/gcc/tree.c index 7c891dcbf91..089e2418747 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -283,7 +283,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_SHARED */ 1, /* OMP_CLAUSE_FIRSTPRIVATE */ 2, /* OMP_CLAUSE_LASTPRIVATE */ - 5, /* OMP_CLAUSE_REDUCTION */ + 6, /* OMP_CLAUSE_REDUCTION */ 5, /* OMP_CLAUSE_TASK_REDUCTION */ 5, /* OMP_CLAUSE_IN_REDUCTION */ 1, /* OMP_CLAUSE_COPYIN */ @@ -12361,11 +12361,16 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data, WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); case OMP_CLAUSE_REDUCTION: + { + for (int i = 0; i < 6; i++) + WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i)); + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + } + case OMP_CLAUSE_TASK_REDUCTION: case OMP_CLAUSE_IN_REDUCTION: { - int i; - for (i = 0; i < 5; i++) + for (int i = 0; i < 5; i++) WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i)); WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); } diff --git a/gcc/tree.h b/gcc/tree.h index 2f2f109451a..ece8a6496b0 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1629,6 +1629,8 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (NODE, OMP_CLAUSE_REDUCTION, \ OMP_CLAUSE_IN_REDUCTION), 4) +#define OMP_CLAUSE_REDUCTION_PRIVATE_DECL(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_REDUCTION), 5) /* True if a REDUCTION clause may reference the original list item (omp_orig) in its OMP_CLAUSE_REDUCTION_{,GIMPLE_}INIT. */ From patchwork Thu Sep 5 01:46:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1158174 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508342-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ZWknyxdd"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46P3Ty5w6Gz9s7T for ; Thu, 5 Sep 2019 11:47:34 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=WJYIc9bxmX/Ssbj3SregwDm5vvo9vO2OwXCFb8/FZCTAcilG90dnx yC2CzBUgILHfBPilwReMygG84LNDh/k4NzvgCOXWYL4d5ti3+eDIQga6kY5HtpQU 3trWGHDnFyQc/k5/6ryweFlaQpccuAH9Or++2Z5t90EGO4PVs9bC7s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=oNEp5ooF0HdsdWR8KAafDZoOGAA=; b=ZWknyxddMtz/Y9zl9ebwgWvoSRX6 LEeLM4x4VkGqpHxi6RoOqBGM/JEoFrFZ0B06sQoyYzlAbEff1pKu8xcEWvpGW+G5 D+3r4qzPhE+WnZ5Q6Qcyc41rtrTS8u179sKwj+VSSypnYyTVfADbPOR43dNQUL94 Q5UaLTW6J0p69WI= Received: (qmail 8709 invoked by alias); 5 Sep 2019 01:47:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8578 invoked by uid 89); 5 Sep 2019 01:47:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=HX-Languages-Length:3929 X-HELO: esa3.mentor.iphmx.com Received: from esa3.mentor.iphmx.com (HELO esa3.mentor.iphmx.com) (68.232.137.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 01:47:14 +0000 IronPort-SDR: a2RzYOPN5zlReVubOPA0o7wLid7P0bGtxmn3eK2p+E9AOf38C/yggb5cZ9NDR38K9OlhCOOclq r3KpGky8Xgduj59mGvAMnIVgZlGeTpR5uyFe+f1C1Ba4Z6VjfDu2JP8A+RKqCBH1gD/kX/L5lx QCq43xU9knWFgeLDqgF5fA/B62/cPn+AXGrMRn7BdAIAXJPdOjlg3BtSoDmBQ4PvQD1ttmwLys Lw4Yu/FW4wGWhu8IPnPWl4NfJpOY2h1l1RI9TLCxwR60S1cDyjRDSR5QGtUxxkEQ5lyDU7/hPV JMA= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 04 Sep 2019 17:47:12 -0800 IronPort-SDR: euVMwlWzYps8W3l1JIFf5r4r+3DZxJ/a/S7wVWv0+6T4u7Upxo4Vsdp/EauNZr3wT1uxpjdkof XST1ctdQb1ABTdddTh8JNw0PBAkL6km4vScGWtf7e/XfeBtjrmE/VNXC8xz8ikXrGU+AD9IF5M 3tYJFuEfEoXTaxUtfgTyDhelfZtRaEOB1S4mPgmE9YZW0s/XaDIE3KaM39yQuRmImjhgRYHXn1 J/vbO5KXxuc4naCFWGf89qBxy97wgnAu5/6McXDl2Qm7RU6s3QcmrQM7Wz5ErtXLa1L41WSF7f SZ4= From: Julian Brown To: CC: Andrew Stubbs Subject: [PATCH 6/6] [og9] Enable worker partitioning for AMD GCN Date: Wed, 4 Sep 2019 18:46:58 -0700 Message-ID: <79e7692178509467f622ecc649cda6aa8717406a.1567644180.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch enables middle-end worker partitioning and multiple workers on AMD GCN. Julian ChangeLog gcc/ * config/gcn/gcn.c (gcn_goacc_validate_dims): Remove no-flag_worker-partitioning assertion. (TARGET_GOACC_WORKER_PARTITIONING): Define target hook to true. * config/gcn/gcn.opt (flag_worker_partitioning): Change default to 1. libgomp/ * plugin/plugin-gcn.c (gcn_exec): Change default number of workers to 16. --- gcc/ChangeLog.openacc | 7 +++++++ gcc/config/gcn/gcn.c | 4 ++-- gcc/config/gcn/gcn.opt | 2 +- libgomp/ChangeLog.openacc | 5 +++++ libgomp/plugin/plugin-gcn.c | 4 +--- 5 files changed, 16 insertions(+), 6 deletions(-) diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index 2b7f616810d..dde474d144d 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,10 @@ +2019-09-05 Julian Brown + + * config/gcn/gcn.c (gcn_goacc_validate_dims): Remove + no-flag_worker-partitioning assertion. + (TARGET_GOACC_WORKER_PARTITIONING): Define target hook to true. + * config/gcn/gcn.opt (flag_worker_partitioning): Change default to 1. + 2019-09-05 Cesar Philippidis Julian Brown diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index ca9321b5f25..b7cf6f093fa 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -4659,8 +4659,6 @@ gcn_goacc_validate_dims (tree decl, int dims[], int fn_level, /* FIXME: remove -facc-experimental-workers when they're ready. */ int max_workers = flag_worker_partitioning ? 16 : 1; - gcc_assert (!flag_worker_partitioning); - /* The vector size must appear to be 64, to the user, unless this is a SEQ routine. The real, internal value is always 1, which means use autovectorization, but the user should not see that. */ @@ -6035,6 +6033,8 @@ print_operand (FILE *file, rtx x, int code) #define TARGET_GOACC_REDUCTION gcn_goacc_reduction #undef TARGET_GOACC_VALIDATE_DIMS #define TARGET_GOACC_VALIDATE_DIMS gcn_goacc_validate_dims +#undef TARGET_GOACC_WORKER_PARTITIONING +#define TARGET_GOACC_WORKER_PARTITIONING true #undef TARGET_HARD_REGNO_MODE_OK #define TARGET_HARD_REGNO_MODE_OK gcn_hard_regno_mode_ok #undef TARGET_HARD_REGNO_NREGS diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt index 90d35f42e57..2fd3996edba 100644 --- a/gcc/config/gcn/gcn.opt +++ b/gcc/config/gcn/gcn.opt @@ -62,7 +62,7 @@ Target Report RejectNegative Var(flag_bypass_init_error) bool flag_worker_partitioning = false macc-experimental-workers -Target Report Var(flag_worker_partitioning) Init(0) +Target Report Var(flag_worker_partitioning) Init(1) int stack_size_opt = -1 diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc index c7ef40e922c..438bd59b47b 100644 --- a/libgomp/ChangeLog.openacc +++ b/libgomp/ChangeLog.openacc @@ -1,3 +1,8 @@ +2019-09-05 Julian Brown + + * plugin/plugin-gcn.c (gcn_exec): Change default number of workers to + 16. + 2019-09-05 Julian Brown * testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 099f70b647c..f0b22ebc3d7 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3244,10 +3244,8 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs, problem size, so let's do a reasonable number of single-worker gangs. 64 gangs matches a typical Fiji device. */ - /* NOTE: Until support for middle-end worker partitioning is merged, use 1 - for the default number of workers. */ if (dims[0] == 0) dims[0] = 64; /* Gangs. */ - if (dims[1] == 0) dims[1] = 1; /* Workers. */ + if (dims[1] == 0) dims[1] = 16; /* Workers. */ /* The incoming dimensions are expressed in terms of gangs, workers, and vectors. The HSA dimensions are expressed in terms of "work-items",