From patchwork Tue Aug 13 21:37:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1146676 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506858-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="w9iI/LNq"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467R005s3zz9sML for ; Wed, 14 Aug 2019 07:37:52 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=aifPdjPegOsnneEFPpEefNdJur6iAYlhcf6EjzGX5yhmSDTioop+9 5ejoOKchKCo+a4In3RJ4dzgUefBeTjuOo+UDX3w8PNUnsrIe6xxxaFCIwNaQ/mSU RDyYbgYN7P8w829oQJJ0ufTj5IXZw5MYKTBkuTU06zXbELr9nOUL2c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=hceqde9b2lEnjBZWRViUd0skqWI=; b=w9iI/LNq6OZJ22r5qXQN9eeOIK8V 02zHUEisMM6hjhMz7uCXRliK7qle27BiHppI8MvFS8ByV27t6wBg8NoxvgL97ebe pkJC8ROPo+TGXQrK8JXnzyzxK9ZLPqMhvK46bzcSG/+pPa7+Xu0kfC8p3M0Vfybv DNPuj7ZVln3JjMM= Received: (qmail 48053 invoked by alias); 13 Aug 2019 21:37:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 48008 invoked by uid 89); 13 Aug 2019 21:37:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=outlined X-HELO: esa2.mentor.iphmx.com Received: from esa2.mentor.iphmx.com (HELO esa2.mentor.iphmx.com) (68.232.141.98) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Aug 2019 21:37:34 +0000 IronPort-SDR: 1IBYMKZvGvz+0kqegSpqjutInB/BCk43aeuONC8KyetkLLgjmC745IJyNmvhJQ+OaEaqokoNjD vuqIssOcW0bMiYmctyB0KDVg10Gv/nlVZnGc5wjvFB4nrR08JAwfoDgfXVJxQLJPgmcEBonOAp W/iZxZ2hNuxoy9h9zKP/EO0kjCbG2A/S0rxNGkA61iUkDpApx/ikDv1fQ4DYxV1SeaOT+op7W7 kfa4NnU9MKFZe2xKuUuGpdGj4dvEQ46Z/TLknTzK91yXtTmPywFUZgnwQpNYvxHL6yKhyxTc1T d60= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 13 Aug 2019 13:37:34 -0800 IronPort-SDR: vmc1H88R2/2JvAYGrjX5Jdwxlt2kxNvqY2usfd0I6Sdijz0TaDaKfJ7Y4Fj6kCida8pR7CQRCv umabQ2W3brXfChEsXhVsV0WBJSSqHkvLR2jhJ6+VgX4IkWNKy54jg8/rj0Sd1Rp6/1xNYgLOcR phF+SlV22IWM6BRuahFYjt6/pH5MgdDcMCCkFd1E9xwFaWXoiQylImJQFguYYozFv9Z4orBXQX qTD0+a06O6ngddlZ1H8u8Bwj1Pc8GqI1k47TGBS+afz03VWcxYV8Vlb9FOYzggjQtiTehigIU0 CkE= From: Julian Brown To: CC: Andrew Stubbs , Jakub Jelinek Subject: [PATCH 1/3] [og9] Wait at end of OpenACC asynchronous kernels regions Date: Tue, 13 Aug 2019 14:37:13 -0700 Message-ID: <79cc9084f24fec88df02daa5b099c8288ee06626.1565729221.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch provides a workaround for unreliable operation of asynchronous kernels regions on AMD GCN. At present, kernels regions are decomposed into a series of parallel regions surrounded by a data region capturing the data-movement clauses needed by the region as a whole: #pragma acc kernels async(n) { ... } is translated to: #pragma acc data copyin(...) copyout(...) { #pragma acc parallel async(n) present(...) { ... } #pragma acc parallel async(n) present(...) { ... } } This is however problematic for two reasons: - Variables mapped by the data clause will be unmapped immediately at the end of the data region, regardless of whether the inner asynchronous parallels have completed. (This causes crashes for GCN.) - Even if the "present" clause caused the reference count to stay above zero at the end of the data region -- which it doesn't -- the "present" clauses on the inner parallel regions would not cause "copyout" variables to be transferred back to the host at the appropriate time, i.e. when the async parallel region had completed. There is no "async" data construct in OpenACC, so the correct solution (which I am deferring on for now) is probably to use asynchronous "enter data" and "exit data" directives when translating asynchronous kernels regions instead. The attached patch just adds a "wait" operation before the end of the enclosing data region. This works, but introduces undesirable synchronisation with the host. Julian ChangeLog gcc/ * omp-oacc-kernels.c (add_wait): New function, split out of... (add_async_clauses_and_wait): ...here. Call new outlined function. (decompose_kernels_region_body): Add wait at the end of explicitly-asynchronous kernels regions. --- gcc/ChangeLog.openacc | 7 +++++++ gcc/omp-oacc-kernels.c | 28 +++++++++++++++++++++------- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index 84d80511603..a22f07c817c 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,10 @@ +2019-08-13 Julian Brown + + * omp-oacc-kernels.c (add_wait): New function, split out of... + (add_async_clauses_and_wait): ...here. Call new outlined function. + (decompose_kernels_region_body): Add wait at the end of + explicitly-asynchronous kernels regions. + 2019-08-08 Julian Brown * config/gcn/gcn.c (gcn_goacc_validate_dims): Ensure diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c index 20913859c12..a6c4220f472 100644 --- a/gcc/omp-oacc-kernels.c +++ b/gcc/omp-oacc-kernels.c @@ -900,6 +900,18 @@ maybe_build_inner_data_region (location_t loc, gimple *body, return body; } +static void +add_wait (location_t loc, gimple_seq *region_body) +{ + /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0). */ + tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); + tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); + gimple *wait_call = gimple_build_call (wait_fn, 2, + sync_arg, integer_zero_node); + gimple_set_location (wait_call, loc); + gimple_seq_add_stmt (region_body, wait_call); +} + /* Helper function of decompose_kernels_region_body. The statements in REGION_BODY are expected to be decomposed parallel regions; add an "async" clause to each. Also add a "wait" pragma at the end of the @@ -923,13 +935,7 @@ add_async_clauses_and_wait (location_t loc, gimple_seq *region_body) gimple_omp_target_set_clauses (as_a (stmt), target_clauses); } - /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0). */ - tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); - tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); - gimple *wait_call = gimple_build_call (wait_fn, 2, - sync_arg, integer_zero_node); - gimple_set_location (wait_call, loc); - gimple_seq_add_stmt (region_body, wait_call); + add_wait (loc, region_body); } /* Auxiliary analysis of the body of a kernels region, to determine for each @@ -1378,6 +1384,14 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses) a wait directive at the end. */ if (async_clause == NULL) add_async_clauses_and_wait (loc, ®ion_body); + else + /* !!! If we have asynchronous parallel blocks inside a (synchronous) data + region, then target memory will get unmapped at the point the data + region ends, even if the inner asynchronous parallels have not yet + completed. For kernels marked "async", we might want to use "enter data + async(...)" and "exit data async(...)" instead. + For now, insert a (synchronous) wait at the end of the block. */ + add_wait (loc, ®ion_body); tree kernels_locals = gimple_bind_vars (as_a (kernels_body)); gimple *body = gimple_build_bind (kernels_locals, region_body, From patchwork Tue Aug 13 21:37:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1146677 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506859-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="HxEE5g98"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467R0D4CCQz9sML for ; Wed, 14 Aug 2019 07:38:04 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=S6JwpJUGkp5koPpHriTUPa3FtDJK7Euyi7rIgbkEXiKq3l/++/wPf icFOS+k6x/TxX8o5b9BpjlB4h1texVVKKXWZCDQFcOoqQqtNvIsru4FJnLEWD18r xPAHDYqJn6vFXYpvptVmHP2NdC8+MIqvZikdlVWY/pAciY/iaGeyAc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=LlYVH7zpMXho/oLoxnGjVmLJUDc=; b=HxEE5g98SMJtBVoKZ/t756Bunhi9 ghJbmXBnWTuOolU0AWwFdLbAMAI+ezA2unalzf649mOPNxfKE7IIDJXmeisE+zx+ IL0lTyTGILpY8+aa4NRFcmiE3Eq2MLAJ/8QJ23uEr9mLtIbjFknfESXDFMCnuOke FE1PzYj1v//BPUA= Received: (qmail 48677 invoked by alias); 13 Aug 2019 21:37:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 48615 invoked by uid 89); 13 Aug 2019 21:37:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy=transfers X-HELO: esa2.mentor.iphmx.com Received: from esa2.mentor.iphmx.com (HELO esa2.mentor.iphmx.com) (68.232.141.98) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Aug 2019 21:37:38 +0000 IronPort-SDR: TXJurYaz4w/sVS5UHkn4QyuD8ilpuTyVox7TGoqveCkEoBUs8W6+oziH7ki9cc1Kc5hwwj4jKo PIRcJE7AFTOUgIMn6kmdlpIl5EkD6iMf3vacYuqQEKuV6Ddn6hTLhPQCKvEgfqMDjpc/J60YYJ hYRzrSbAhFKMoejmWFseomrinhhKYhGY6mA+X6jPJpE1XDD1yEGBWty2+xxw4AnWhFw6w9LLPu vQgPx4xvwT+lv+PeLdPL6ceoTuVD6az+R/c5i00MgUgMiYcYlNTdTpsn/6SZoO7BfR7AkveZtY x/M= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 13 Aug 2019 13:37:36 -0800 IronPort-SDR: pNc/Wpvlazx+etYed4DYV98Kd5jXvUjl5PBnVWQrSILH3vaBMmWJi4DI8TV560hIgnlUbb8tBP kNp+9DaRVD/EuuQhgBVc9E3/D0xQ3oTq0ikc8ki3l7bh5D3w60Mlwp/mdzJP1L34poyMjv70/e cH3q7oGt2k+Rqv/X1ui6Zqnce8e9XVSrbJvGR+ILvpZaD2QvLe55yHcbPF+0kUiePPjfSnF5ti MFMo7XZUAEPtapGXd/SMO5xPlwuE6Vikx7af3eH/YDc+CC7TJzFdxLJA+E4vYvqLKJm6w+v40+ AhU= From: Julian Brown To: CC: Andrew Stubbs , Jakub Jelinek Subject: [PATCH 2/3] [og9] Use temporary buffers for async host2dev copies Date: Tue, 13 Aug 2019 14:37:14 -0700 Message-ID: <6723cd26bad519660b91d8eb371d6c9d57876e72.1565729221.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes In libgomp, host-to-device transfers are instigated in several places where the source data is either on the stack, or in an unstable heap location (i.e. which is immediately freed after performing the host-to-device transfer). When the transfer is asynchronous, this means that taking the address of source data and attempting the copy from that at some later point is extremely likely to fail. A previous fix for this problem (from our internal branch, and included with the AMD GCN offloading patches) attempted to separate transfers from the stack (performing them immediately) from transfers from the heap (which can safely be done some time later). Unfortunately that doesn't work well with more recent changes to libgomp and the GCN plugin. So instead, this patch copies the source data for asynchronous host-to-device copies immediately to a temporary buffer, then the transfer to the device can safely take place asynchronously some time later. Julian ChangeLog libgomp/ * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. (copy_data): Free temporary buffer if using. (queue_push_copy): Add using_src_copy parameter. (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to queue_push_copy. (GOMP_OFFLOAD_async_host2dev): Likewise. Allocate temporary buffer and copy source data to it immediately. * target.c (gomp_copy_host2dev): Update function comment. (copy_host2dev_immediate): Remove. (gomp_map_pointer, gomp_map_vars_internal): Replace calls to copy_host2dev_immediate with calls to gomp_copy_host2dev. --- libgomp/ChangeLog.openacc | 14 ++++++++++ libgomp/plugin/plugin-gcn.c | 20 ++++++++++--- libgomp/target.c | 56 +++++++++++++++---------------------- 3 files changed, 52 insertions(+), 38 deletions(-) diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc index 2279545f361..2a9a7f18ca2 100644 --- a/libgomp/ChangeLog.openacc +++ b/libgomp/ChangeLog.openacc @@ -1,3 +1,17 @@ +2019-08-13 Julian Brown + + * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. + (copy_data): Free temporary buffer if using. + (queue_push_copy): Add using_src_copy parameter. + (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to + queue_push_copy. + (GOMP_OFFLOAD_async_host2dev): Likewise. Allocate temporary buffer and + copy source data to it immediately. + * target.c (gomp_copy_host2dev): Update function comment. + (copy_host2dev_immediate): Remove. + (gomp_map_pointer, gomp_map_vars_internal): Replace calls to + copy_host2dev_immediate with calls to gomp_copy_host2dev. + 2019-08-08 Julian Brown * plugin/plugin-gcn.c (gcn_exec): Use 1 for the default number of diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index a41568b3306..65690e643ed 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3063,6 +3063,7 @@ struct copy_data const void *src; size_t len; bool use_hsa_memory_copy; + bool using_src_copy; struct goacc_asyncqueue *aq; }; @@ -3077,12 +3078,14 @@ copy_data (void *data_) hsa_fns.hsa_memory_copy_fn (data->dst, data->src, data->len); else memcpy (data->dst, data->src, data->len); + if (data->using_src_copy) + free ((void *) data->src); free (data); } static void queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, - size_t len, bool use_hsa_memory_copy) + size_t len, bool use_hsa_memory_copy, bool using_src_copy) { if (DEBUG_QUEUES) HSA_DEBUG ("queue_push_copy %d:%d: %zu bytes from (%p) to (%p)\n", @@ -3093,6 +3096,7 @@ queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, data->src = src; data->len = len; data->use_hsa_memory_copy = use_hsa_memory_copy; + data->using_src_copy = using_src_copy; data->aq = aq; queue_push_callback (aq, copy_data, data); } @@ -3137,7 +3141,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n) { struct agent_info *agent = get_agent_info (device); maybe_init_omp_async (agent); - queue_push_copy (agent->omp_async_queue, dst, src, n, false); + queue_push_copy (agent->omp_async_queue, dst, src, n, false, false); return true; } @@ -3469,7 +3473,15 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void *dst, const void *src, { struct agent_info *agent = get_agent_info (device); assert (agent == aq->agent); - queue_push_copy (aq, dst, src, n, image_address_p (agent, dst)); + /* The source data does not necessarily remain live until the deferred + copy happens. Taking a snapshot of the data here avoids reading + uninitialised data later, but means that (a) data is copied twice and + (b) modifications to the copied data between the "spawning" point of + the asynchronous kernel and when it is executed will not be seen. + But, that is probably correct. */ + void *src_copy = GOMP_PLUGIN_malloc (n); + memcpy (src_copy, src, n); + queue_push_copy (aq, dst, src_copy, n, image_address_p (agent, dst), true); return true; } @@ -3479,7 +3491,7 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, void *dst, const void *src, { struct agent_info *agent = get_agent_info (device); assert (agent == aq->agent); - queue_push_copy (aq, dst, src, n, image_address_p (agent, src)); + queue_push_copy (aq, dst, src, n, image_address_p (agent, src), false); return true; } diff --git a/libgomp/target.c b/libgomp/target.c index 4645894f869..5f7f946e2ba 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -303,10 +303,9 @@ gomp_to_device_kind_p (int kind) } /* Copy host memory to an offload device. In asynchronous mode (if AQ is - non-NULL), this is only safe when the source memory is a global or heap - location (otherwise a copy may take place from a dangling pointer to an - expired stack frame). Use copy_host2dev_immediate for copies from stack - locations. */ + non-NULL), H may point to a stack location. It is up to the underlying + plugin to ensure that this data is read immediately, rather than at some + later point when the stack frame will likely have been destroyed. */ attribute_hidden void gomp_copy_host2dev (struct gomp_device_descr *devicep, @@ -346,17 +345,6 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, gomp_device_copy (devicep, devicep->host2dev_func, "dev", d, "host", h, sz); } -/* Use this variant for host-to-device copies from stack locations that may not - be live at the time an asynchronous copy operation takes place. */ - -static void -copy_host2dev_immediate (struct gomp_device_descr *devicep, void *d, - const void *h, size_t sz, - struct gomp_coalesce_buf *cbuf) -{ - gomp_copy_host2dev (devicep, NULL, d, h, sz, cbuf); -} - attribute_hidden void gomp_copy_dev2host (struct gomp_device_descr *devicep, struct goacc_asyncqueue *aq, @@ -617,10 +605,10 @@ gomp_map_pointer (struct target_mem_desc *tgt, struct goacc_asyncqueue *aq, if (cur_node.host_start == (uintptr_t) NULL) { cur_node.tgt_offset = (uintptr_t) NULL; - copy_host2dev_immediate (devicep, - (void *) (tgt->tgt_start + target_offset), - (void *) &cur_node.tgt_offset, - sizeof (void *), cbuf); + gomp_copy_host2dev (devicep, aq, + (void *) (tgt->tgt_start + target_offset), + (void *) &cur_node.tgt_offset, sizeof (void *), + cbuf); return; } /* Add bias to the pointer value. */ @@ -639,9 +627,8 @@ gomp_map_pointer (struct target_mem_desc *tgt, struct goacc_asyncqueue *aq, array section. Now subtract bias to get what we want to initialize the pointer with. */ cur_node.tgt_offset -= bias; - copy_host2dev_immediate (devicep, (void *) (tgt->tgt_start + target_offset), - (void *) &cur_node.tgt_offset, sizeof (void *), - cbuf); + gomp_copy_host2dev (devicep, aq, (void *) (tgt->tgt_start + target_offset), + (void *) &cur_node.tgt_offset, sizeof (void *), cbuf); } static void @@ -1460,13 +1447,13 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, cur_node.tgt_offset = gomp_map_val (tgt, hostaddrs, i - 1); if (cur_node.tgt_offset) cur_node.tgt_offset -= sizes[i]; - copy_host2dev_immediate (devicep, - (void *) (n->tgt->tgt_start - + n->tgt_offset - + cur_node.host_start - - n->host_start), - (void *) &cur_node.tgt_offset, - sizeof (void *), cbufp); + gomp_copy_host2dev (devicep, aq, + (void *) (n->tgt->tgt_start + + n->tgt_offset + + cur_node.host_start + - n->host_start), + (void *) &cur_node.tgt_offset, + sizeof (void *), cbufp); cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset + cur_node.host_start - n->host_start; continue; @@ -1705,8 +1692,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset); /* We intentionally do not use coalescing here, as it's not data allocated by the current call to this function. */ - copy_host2dev_immediate (devicep, (void *) n->tgt_offset, - &tgt_addr, sizeof (void *), NULL); + gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset, + &tgt_addr, sizeof (void *), NULL); } array++; } @@ -1828,9 +1815,10 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, for (i = 0; i < mapnum; i++) { cur_node.tgt_offset = gomp_map_val (tgt, hostaddrs, i); - copy_host2dev_immediate (devicep, - (void *) (tgt->tgt_start + i * sizeof (void *)), - (void *) &cur_node.tgt_offset, sizeof (void *), cbufp); + gomp_copy_host2dev (devicep, aq, + (void *) (tgt->tgt_start + i * sizeof (void *)), + (void *) &cur_node.tgt_offset, sizeof (void *), + cbufp); } } From patchwork Tue Aug 13 21:37:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1146678 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506860-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Rgp3+RY4"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467R0R6gXzz9sML for ; Wed, 14 Aug 2019 07:38:15 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; q=dns; s= default; b=kaevycMngHm7ccxtfZnH0Y2/0y2zGG4U9e0PEBeUvMOLVBm/81gI6 FFdbfueRHABQJ+VwgNhtykloivRRaGaBvtRW/G/t2IKPzTMNn9lRFPpBXaWUDKgf buhKVdtQAfxpsmI4yEHyBIGiff/5NZmtnh5LiHQ7adYj9wpvwrFW90= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=default; bh=Qh03voLe2FyaFwYMjzvyuRA3DYQ=; b=Rgp3+RY4TzhRgpfMn4FISju2VinC JiJaJmFI+lbjFli8FpLe1gjYPdUedZGG6+PGaZnNil3Y8AT7FMjxMYYuPM4U3NOx PHmY/qAIYkr+fRRT8Md/ichv30Vte5hgcXhKIJXS/P4zcN3Pk6cuRCzTbOTK9nCh eXOYZbiITxGqXmE= Received: (qmail 48915 invoked by alias); 13 Aug 2019 21:37:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 48869 invoked by uid 89); 13 Aug 2019 21:37:42 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy=agent, HContent-Transfer-Encoding:8bit X-HELO: esa2.mentor.iphmx.com Received: from esa2.mentor.iphmx.com (HELO esa2.mentor.iphmx.com) (68.232.141.98) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Aug 2019 21:37:40 +0000 IronPort-SDR: XTdBh4U6MiPFJ/PrQq5WGgWbE5KRy53KKi08/7n2V2xehYkSyr3Uqb/DQ0Ipmqvo9VWPUgD7OO x+jdPXHYGFE/87kOIyAPz58ovnAb2DHrTLxIOGJT7YaZOQy/W30RCY2K4WWettadtOzCzFTjNt Zszj19hHvz6sZTLsPFKDgc/ZDBs6PhtPzRqqG2k78UkBQNbQavwNVDURFELmDzabLk0AZ87iVt zQawCFEKjcWR4o27XW3e6PLolLGDcq0RlTl4RB7qkRLxTbC5slnniRNe27ZLar1RjvaKF5lLcH UYY= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 13 Aug 2019 13:37:40 -0800 IronPort-SDR: Cz8zCuNokM1TIjYY33YiNWeA/Tmr1nJLuU3Q9OcwXkojc6nKXpP7j3Qc5wOQw9k/k2aC83AHA5 QH4eBJW0ul0FeeYTd267JkstRISi7U10R31GltiOE2EvJrFL3aetSMAeIa89OtByfjq1PDFFiA klDcIvG69Jo2fKbyst5omicJdiZolmGy2JztUK+NVU3Outl0ez+gL2E6zcVyHOnMuF+/vP2aMn 75QNpFbj1OfmJb4vqTdw2lbmnu0WHf554aJW/TvJvScMqJzuAoni5Trfw9bZ1yc7Xsqd51aKXS M90= From: Julian Brown To: CC: Andrew Stubbs , Jakub Jelinek Subject: [PATCH 3/3] [og9] Wait on queue-full condition in AMD GCN libgomp offloading plugin Date: Tue, 13 Aug 2019 14:37:15 -0700 Message-ID: In-Reply-To: References: MIME-Version: 1.0 X-IsSubscribed: yes This patch lets the AMD GCN libgomp plugin wait for asynchronous queues to have some space to push new operations when they are full, rather than just erroring out immediately on that condition. This fixes the libgomp.oacc-c-c++-common/da-4.c test. Julian ChangeLog libgomp/ * plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full condition. --- libgomp/ChangeLog.openacc | 5 +++++ libgomp/plugin/plugin-gcn.c | 11 +++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc index 2a9a7f18ca2..f9d8e6ecd39 100644 --- a/libgomp/ChangeLog.openacc +++ b/libgomp/ChangeLog.openacc @@ -1,3 +1,8 @@ +2019-08-13 Julian Brown + + * plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full + condition. + 2019-08-13 Julian Brown * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 65690e643ed..099f70b647c 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -1416,8 +1416,15 @@ queue_push_callback (struct goacc_asyncqueue *aq, void (*fn)(void *), void *data) { if (aq->queue_n == ASYNC_QUEUE_SIZE) - GOMP_PLUGIN_fatal ("Async thread %d:%d: error: queue overflowed", - aq->agent->device_id, aq->id); + { + pthread_mutex_lock (&aq->mutex); + + /* Queue is full. Wait for it to not be full. */ + while (aq->queue_n == ASYNC_QUEUE_SIZE) + pthread_cond_wait (&aq->queue_cond_out, &aq->mutex); + + pthread_mutex_unlock (&aq->mutex); + } pthread_mutex_lock (&aq->mutex);