From patchwork Tue Aug 13 21:37:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1146676 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506858-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="w9iI/LNq"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467R005s3zz9sML for ; Wed, 14 Aug 2019 07:37:52 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=aifPdjPegOsnneEFPpEefNdJur6iAYlhcf6EjzGX5yhmSDTioop+9 5ejoOKchKCo+a4In3RJ4dzgUefBeTjuOo+UDX3w8PNUnsrIe6xxxaFCIwNaQ/mSU RDyYbgYN7P8w829oQJJ0ufTj5IXZw5MYKTBkuTU06zXbELr9nOUL2c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=hceqde9b2lEnjBZWRViUd0skqWI=; b=w9iI/LNq6OZJ22r5qXQN9eeOIK8V 02zHUEisMM6hjhMz7uCXRliK7qle27BiHppI8MvFS8ByV27t6wBg8NoxvgL97ebe pkJC8ROPo+TGXQrK8JXnzyzxK9ZLPqMhvK46bzcSG/+pPa7+Xu0kfC8p3M0Vfybv DNPuj7ZVln3JjMM= Received: (qmail 48053 invoked by alias); 13 Aug 2019 21:37:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 48008 invoked by uid 89); 13 Aug 2019 21:37:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=outlined X-HELO: esa2.mentor.iphmx.com Received: from esa2.mentor.iphmx.com (HELO esa2.mentor.iphmx.com) (68.232.141.98) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Aug 2019 21:37:34 +0000 IronPort-SDR: 1IBYMKZvGvz+0kqegSpqjutInB/BCk43aeuONC8KyetkLLgjmC745IJyNmvhJQ+OaEaqokoNjD vuqIssOcW0bMiYmctyB0KDVg10Gv/nlVZnGc5wjvFB4nrR08JAwfoDgfXVJxQLJPgmcEBonOAp W/iZxZ2hNuxoy9h9zKP/EO0kjCbG2A/S0rxNGkA61iUkDpApx/ikDv1fQ4DYxV1SeaOT+op7W7 kfa4NnU9MKFZe2xKuUuGpdGj4dvEQ46Z/TLknTzK91yXtTmPywFUZgnwQpNYvxHL6yKhyxTc1T d60= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 13 Aug 2019 13:37:34 -0800 IronPort-SDR: vmc1H88R2/2JvAYGrjX5Jdwxlt2kxNvqY2usfd0I6Sdijz0TaDaKfJ7Y4Fj6kCida8pR7CQRCv umabQ2W3brXfChEsXhVsV0WBJSSqHkvLR2jhJ6+VgX4IkWNKy54jg8/rj0Sd1Rp6/1xNYgLOcR phF+SlV22IWM6BRuahFYjt6/pH5MgdDcMCCkFd1E9xwFaWXoiQylImJQFguYYozFv9Z4orBXQX qTD0+a06O6ngddlZ1H8u8Bwj1Pc8GqI1k47TGBS+afz03VWcxYV8Vlb9FOYzggjQtiTehigIU0 CkE= From: Julian Brown To: CC: Andrew Stubbs , Jakub Jelinek Subject: [PATCH 1/3] [og9] Wait at end of OpenACC asynchronous kernels regions Date: Tue, 13 Aug 2019 14:37:13 -0700 Message-ID: <79cc9084f24fec88df02daa5b099c8288ee06626.1565729221.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch provides a workaround for unreliable operation of asynchronous kernels regions on AMD GCN. At present, kernels regions are decomposed into a series of parallel regions surrounded by a data region capturing the data-movement clauses needed by the region as a whole: #pragma acc kernels async(n) { ... } is translated to: #pragma acc data copyin(...) copyout(...) { #pragma acc parallel async(n) present(...) { ... } #pragma acc parallel async(n) present(...) { ... } } This is however problematic for two reasons: - Variables mapped by the data clause will be unmapped immediately at the end of the data region, regardless of whether the inner asynchronous parallels have completed. (This causes crashes for GCN.) - Even if the "present" clause caused the reference count to stay above zero at the end of the data region -- which it doesn't -- the "present" clauses on the inner parallel regions would not cause "copyout" variables to be transferred back to the host at the appropriate time, i.e. when the async parallel region had completed. There is no "async" data construct in OpenACC, so the correct solution (which I am deferring on for now) is probably to use asynchronous "enter data" and "exit data" directives when translating asynchronous kernels regions instead. The attached patch just adds a "wait" operation before the end of the enclosing data region. This works, but introduces undesirable synchronisation with the host. Julian ChangeLog gcc/ * omp-oacc-kernels.c (add_wait): New function, split out of... (add_async_clauses_and_wait): ...here. Call new outlined function. (decompose_kernels_region_body): Add wait at the end of explicitly-asynchronous kernels regions. --- gcc/ChangeLog.openacc | 7 +++++++ gcc/omp-oacc-kernels.c | 28 +++++++++++++++++++++------- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index 84d80511603..a22f07c817c 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,10 @@ +2019-08-13 Julian Brown + + * omp-oacc-kernels.c (add_wait): New function, split out of... + (add_async_clauses_and_wait): ...here. Call new outlined function. + (decompose_kernels_region_body): Add wait at the end of + explicitly-asynchronous kernels regions. + 2019-08-08 Julian Brown * config/gcn/gcn.c (gcn_goacc_validate_dims): Ensure diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c index 20913859c12..a6c4220f472 100644 --- a/gcc/omp-oacc-kernels.c +++ b/gcc/omp-oacc-kernels.c @@ -900,6 +900,18 @@ maybe_build_inner_data_region (location_t loc, gimple *body, return body; } +static void +add_wait (location_t loc, gimple_seq *region_body) +{ + /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0). */ + tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); + tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); + gimple *wait_call = gimple_build_call (wait_fn, 2, + sync_arg, integer_zero_node); + gimple_set_location (wait_call, loc); + gimple_seq_add_stmt (region_body, wait_call); +} + /* Helper function of decompose_kernels_region_body. The statements in REGION_BODY are expected to be decomposed parallel regions; add an "async" clause to each. Also add a "wait" pragma at the end of the @@ -923,13 +935,7 @@ add_async_clauses_and_wait (location_t loc, gimple_seq *region_body) gimple_omp_target_set_clauses (as_a (stmt), target_clauses); } - /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0). */ - tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); - tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); - gimple *wait_call = gimple_build_call (wait_fn, 2, - sync_arg, integer_zero_node); - gimple_set_location (wait_call, loc); - gimple_seq_add_stmt (region_body, wait_call); + add_wait (loc, region_body); } /* Auxiliary analysis of the body of a kernels region, to determine for each @@ -1378,6 +1384,14 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses) a wait directive at the end. */ if (async_clause == NULL) add_async_clauses_and_wait (loc, ®ion_body); + else + /* !!! If we have asynchronous parallel blocks inside a (synchronous) data + region, then target memory will get unmapped at the point the data + region ends, even if the inner asynchronous parallels have not yet + completed. For kernels marked "async", we might want to use "enter data + async(...)" and "exit data async(...)" instead. + For now, insert a (synchronous) wait at the end of the block. */ + add_wait (loc, ®ion_body); tree kernels_locals = gimple_bind_vars (as_a (kernels_body)); gimple *body = gimple_build_bind (kernels_locals, region_body,