From patchwork Mon May 15 06:52:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 762286 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wRB9n04cPz9s7k for ; Mon, 15 May 2017 16:53:15 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="iBwqRg5r"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:in-reply-to:references:date:message-id:mime-version :content-type:content-transfer-encoding; q=dns; s=default; b=YwR 2hI6R4cCd6rAfVZbUjbJHFWPoKEjsS6N0NdcDCZqxZhcUSQ3YjSytsGh3FJXqEW+ xXvwfmt9L9CxLz1g/5AVXcayAemD8azm870Q2IBvIMdOk9PRJWo8i7uRePJAuIj9 fwVzwQgLzkHC7CZjoevw3YbPJKqqk/YlC0r3IHk0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:in-reply-to:references:date:message-id:mime-version :content-type:content-transfer-encoding; s=default; bh=U/GKEx8Bh i07GmHEy0NcwTlN/s4=; b=iBwqRg5rkaq/G+j2AX+Pd9PaCj89qB8dgUAiwGhbW YsvI4JF/dn96jdWhTipLk96dwNB+9hrkJBL7RI8e25SKAmKT0tNyKcG64MSgW4gd 08KAvNsn4cDuyYRTSJ3F1JwvfhHcJ1t9txYyeuprQLKlZZ6g4CembmVUmG4HhQvj tE= Received: (qmail 50224 invoked by alias); 15 May 2017 06:52:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 50191 invoked by uid 89); 15 May 2017 06:52:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=initiated, hasn't, hasnt X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 15 May 2017 06:52:47 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=svr-ies-mbx-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1dA9s7-0003RM-8M from Thomas_Schwinge@mentor.com for gcc-patches@gcc.gnu.org; Sun, 14 May 2017 23:52:47 -0700 Received: from hertz.schwinge.homeip.net (137.202.0.87) by svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Mon, 15 May 2017 07:52:43 +0100 From: Thomas Schwinge To: Subject: More OpenACC 2.5 Profiling Interface (was: OpenACC 2.5 Profiling Interface (incomplete)) In-Reply-To: <87k28acit3.fsf@hertz.schwinge.homeip.net> References: <87k28acit3.fsf@hertz.schwinge.homeip.net> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) Date: Mon, 15 May 2017 08:52:39 +0200 Message-ID: <877f1i393c.fsf@hertz.schwinge.homeip.net> MIME-Version: 1.0 X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) Hi! On Tue, 28 Feb 2017 18:43:36 +0100, I wrote: > The 2.5 versions of the OpenACC standard added a new chapter "Profiling > Interface". In r245784, I committed incomplete support to > gomp-4_0-branch. I plan to continue working on this, but wanted to > synchronize at this point. > > commit b22a85fe7f3daeb48460e7aa28606d0cdb799f69 > Author: tschwinge > Date: Tue Feb 28 17:36:03 2017 +0000 > > OpenACC 2.5 Profiling Interface (incomplete) Committed to gomp-4_0-branch in r248042: commit e3720963a1f494b2a0a1b6c28d5eb8bfb7c0d546 Author: tschwinge Date: Mon May 15 06:50:17 2017 +0000 More OpenACC 2.5 Profiling Interface libgomp/ * oacc-async.c (acc_async_test, acc_async_test_all, acc_wait) (acc_wait_async, acc_wait_all, acc_wait_all_async): Set up profiling. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-init.c (acc_set_device_type, acc_get_device_type) (acc_get_device_num): Likewise. * oacc-mem.c (acc_malloc, acc_free, memcpy_tofrom_device) (acc_map_data, acc_unmap_data, present_create_copy) (delete_copyout, update_dev_host): Likewise. * oacc-parallel.c (GOACC_data_start, GOACC_data_end) (GOACC_enter_exit_data, GOACC_update, GOACC_wait): Likewise. * oacc-profiling.c (goacc_profiling_setup_p): New function. (goacc_profiling_dispatch_p): Add a "bool" formal parameter. Adjust all users. * oacc-int.h (goacc_profiling_setup_p) (goacc_profiling_dispatch_p): Update. * plugin/plugin-nvptx.c (nvptx_exec, nvptx_wait, nvptx_wait_all): Generate more profiling events. * libgomp.texi (OpenACC Profiling Interface): Update. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@248042 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp | 24 +++ libgomp/libgomp.texi | 74 +++++++-- libgomp/oacc-async.c | 110 ++++++++++++- libgomp/oacc-cuda.c | 82 ++++++++-- libgomp/oacc-init.c | 102 +++++++++++- libgomp/oacc-int.h | 4 +- libgomp/oacc-mem.c | 154 +++++++++++++++++- libgomp/oacc-parallel.c | 357 +++++++++++++++++++++++++++++++++++++++--- libgomp/oacc-profiling.c | 100 +++++++++++- libgomp/plugin/plugin-nvptx.c | 113 ++++++++++++- 10 files changed, 1056 insertions(+), 64 deletions(-) Grüße Thomas diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index 5dc0889..23882cf 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,3 +1,27 @@ +2017-05-15 Thomas Schwinge + + * oacc-async.c (acc_async_test, acc_async_test_all, acc_wait) + (acc_wait_async, acc_wait_all, acc_wait_all_async): Set up + profiling. + * oacc-cuda.c (acc_get_current_cuda_device) + (acc_get_current_cuda_context, acc_get_cuda_stream) + (acc_set_cuda_stream): Likewise. + * oacc-init.c (acc_set_device_type, acc_get_device_type) + (acc_get_device_num): Likewise. + * oacc-mem.c (acc_malloc, acc_free, memcpy_tofrom_device) + (acc_map_data, acc_unmap_data, present_create_copy) + (delete_copyout, update_dev_host): Likewise. + * oacc-parallel.c (GOACC_data_start, GOACC_data_end) + (GOACC_enter_exit_data, GOACC_update, GOACC_wait): Likewise. + * oacc-profiling.c (goacc_profiling_setup_p): New function. + (goacc_profiling_dispatch_p): Add a "bool" formal parameter. + Adjust all users. + * oacc-int.h (goacc_profiling_setup_p) + (goacc_profiling_dispatch_p): Update. + * plugin/plugin-nvptx.c (nvptx_exec, nvptx_wait, nvptx_wait_all): + Generate more profiling events. + * libgomp.texi (OpenACC Profiling Interface): Update. + 2017-05-14 Thomas Schwinge * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: New diff --git libgomp/libgomp.texi libgomp/libgomp.texi index 93365cd..b3fa139 100644 --- libgomp/libgomp.texi +++ libgomp/libgomp.texi @@ -3207,12 +3207,19 @@ Will be @code{acc_construct_parallel} for OpenACC kernels constructs; should be @code{acc_construct_kernels}. @item +Will be @code{acc_construct_enter_data} or +@code{acc_construct_exit_data} when processing variable mappings +specified in OpenACC declare directives; should be +@code{acc_construct_declare}. + +@item For implicit @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}, and explicit as well as implicit @code{acc_ev_alloc}, @code{acc_ev_free}, @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, -@code{acc_ev_enqueue_download_start}, and -@code{acc_ev_enqueue_download_end}, will be +@code{acc_ev_enqueue_download_start}, +@code{acc_ev_enqueue_download_end}, @code{acc_ev_wait_start}, and +@code{acc_ev_wait_end}, will be @code{acc_construct_parallel}; should reflect the real parent construct. @@ -3221,8 +3228,9 @@ construct. @item @code{acc_event_info.*.implicit} For @code{acc_ev_alloc}, @code{acc_ev_free}, @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, -@code{acc_ev_enqueue_download_start}, and -@code{acc_ev_enqueue_download_end}, this currently will be @code{1} +@code{acc_ev_enqueue_download_start}, +@code{acc_ev_enqueue_download_end}, @code{acc_ev_wait_start}, and +@code{acc_ev_wait_end}, this currently will be @code{1} also for explicit usage. @item @code{acc_event_info.data_event.var_name} @@ -3289,6 +3297,20 @@ it's not clear if they should be. @end itemize +@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end} +@itemize + +@item +Callbacks for these event types will also be invoked for OpenACC +host_data constructs; it's not clear if they should be. + +@item +Callbacks for these event types will also be invoked when processing +variable mappings specified in OpenACC declare directives; it's not +clear if they should be. + +@end itemize + @end table Callbacks for the following event types will be invoked, but dispatch @@ -3297,8 +3319,10 @@ and information provided therein has not yet been thoroughly reviewed: @itemize @item @code{acc_ev_alloc} @item @code{acc_ev_free} +@item @code{acc_ev_update_start}, @code{acc_ev_update_end} @item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end} @item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end} +@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end} @end itemize During device initialization, and finalization, respectively, @@ -3309,14 +3333,6 @@ callbacks for the following event types will not yet be invoked: @item @code{acc_ev_free} @end itemize -Callbacks for the following event types will currently only be invoked -for (implicit) events within compute constructs: - -@itemize -@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end} -@item @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end} -@end itemize - Callbacks for the following event types have not yet been implemented, so currently won't be invoked: @@ -3324,8 +3340,38 @@ so currently won't be invoked: @item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end} @item @code{acc_ev_runtime_shutdown} @item @code{acc_ev_create}, @code{acc_ev_delete} -@item @code{acc_ev_update_start}, @code{acc_ev_update_end} -@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end} +@end itemize + +For the following runtime library functions, not all expected +callbacks will be invoked (mostly concerning implicit device +initialization): + +@itemize +@item @code{acc_get_num_devices} +@item @code{acc_set_device_type} +@item @code{acc_get_device_type} +@item @code{acc_set_device_num} +@item @code{acc_get_device_num} +@item @code{acc_init} +@item @code{acc_shutdown} +@end itemize + +Aside from implicit device initialization, for the following runtime +library functions, no callbacks will be invoked for shared-memory +offloading devices (it's not clear if they should be): + +@itemize +@item @code{acc_malloc} +@item @code{acc_free} +@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async} +@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async} +@item @code{acc_copyout}, @code{acc_copyout_async} +@item @code{acc_delete}, @code{acc_delete_async} +@item @code{acc_update_device}, @code{acc_update_device_async} +@item @code{acc_update_self}, @code{acc_update_self_async} +@item @code{acc_map_data}, @code{acc_unmap_data} +@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async} +@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async} @end itemize diff --git libgomp/oacc-async.c libgomp/oacc-async.c index 921f943..7cefa0f 100644 --- libgomp/oacc-async.c +++ libgomp/oacc-async.c @@ -39,10 +39,30 @@ acc_async_test (int async) struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + if (!thr || !thr->dev) gomp_fatal ("no device active"); - return thr->dev->openacc.async_test_func (async); + int res = thr->dev->openacc.async_test_func (async); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + + return res; } int @@ -50,10 +70,24 @@ acc_async_test_all (void) { struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (!thr || !thr->dev) gomp_fatal ("no device active"); - return thr->dev->openacc.async_test_all_func (); + int res = thr->dev->openacc.async_test_all_func (); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + + return res; } void @@ -64,10 +98,28 @@ acc_wait (int async) struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + if (!thr || !thr->dev) gomp_fatal ("no device active"); thr->dev->openacc.async_wait_func (async); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } void @@ -75,10 +127,28 @@ acc_wait_async (int async1, int async2) { struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async2; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + if (!thr || !thr->dev) gomp_fatal ("no device active"); thr->dev->openacc.async_wait_async_func (async1, async2); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } void @@ -86,10 +156,22 @@ acc_wait_all (void) { struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (!thr || !thr->dev) gomp_fatal ("no device active"); thr->dev->openacc.async_wait_all_func (); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } void @@ -100,15 +182,36 @@ acc_wait_all_async (int async) struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + if (!thr || !thr->dev) gomp_fatal ("no device active"); thr->dev->openacc.async_wait_all_async_func (async); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } int acc_get_default_async (void) { + /* In the following, no OpenACC Profiling Interface events can possibly be + generated. */ + struct goacc_thread *thr = goacc_thread (); if (!thr || !thr->dev) @@ -120,6 +223,9 @@ acc_get_default_async (void) void acc_set_default_async (int async) { + /* In the following, no OpenACC Profiling Interface events can possibly be + generated. */ + if (async < acc_async_sync) gomp_fatal ("invalid async argument: %d", async); diff --git libgomp/oacc-cuda.c libgomp/oacc-cuda.c index 86a2a77..325fc8d 100644 --- libgomp/oacc-cuda.c +++ libgomp/oacc-cuda.c @@ -36,10 +36,23 @@ acc_get_current_cuda_device (void) { struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + + void *ret = NULL; if (thr && thr->dev && thr->dev->openacc.cuda.get_current_device_func) - return thr->dev->openacc.cuda.get_current_device_func (); + ret = thr->dev->openacc.cuda.get_current_device_func (); - return NULL; + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + + return ret; } void * @@ -47,10 +60,23 @@ acc_get_current_cuda_context (void) { struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + + void *ret = NULL; if (thr && thr->dev && thr->dev->openacc.cuda.get_current_context_func) - return thr->dev->openacc.cuda.get_current_context_func (); - - return NULL; + ret = thr->dev->openacc.cuda.get_current_context_func (); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + + return ret; } void * @@ -61,10 +87,29 @@ acc_get_cuda_stream (int async) if (async < 0) return NULL; + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + + void *ret = NULL; if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func) - return thr->dev->openacc.cuda.get_stream_func (async); + ret = thr->dev->openacc.cuda.get_stream_func (async); - return NULL; + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + + return ret; } int @@ -79,8 +124,27 @@ acc_set_cuda_stream (int async, void *stream) thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + + int ret = -1; if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func) - return thr->dev->openacc.cuda.set_stream_func (async, stream); + ret = thr->dev->openacc.cuda.set_stream_func (async, stream); - return -1; + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + + return ret; } diff --git libgomp/oacc-init.c libgomp/oacc-init.c index 415c0fa..c262caa 100644 --- libgomp/oacc-init.c +++ libgomp/oacc-init.c @@ -220,8 +220,23 @@ acc_dev_num_out_of_range (acc_device_t d, int ord, int ndevs) static struct gomp_device_descr * acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit) { + bool check_not_nested_p; + if (implicit) + { + /* In the implicit case, there should (TODO: must?) already be something + have been set up for an outer construct. */ + check_not_nested_p = false; + } + else + { + check_not_nested_p = true; + /* TODO: should we set "thr->prof_info" etc. in this case (acc_init)? + The problem is, that we don't have "thr" yet? (So, + "check_not_nested_p = true" also is pointless actually.) */ + } bool profiling_dispatch_p - = __builtin_expect (goacc_profiling_dispatch_p (), false); + = __builtin_expect (goacc_profiling_dispatch_p (check_not_nested_p), + false); acc_prof_info prof_info; if (profiling_dispatch_p) @@ -536,11 +551,21 @@ ialias (acc_shutdown) int acc_get_num_devices (acc_device_t d) { +#if 0 //TODO + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + prof_info.device_type = d; //TODO +#endif + int n = 0; struct gomp_device_descr *acc_dev; if (d == acc_device_none) - return 0; + goto out; gomp_init_targets_once (); @@ -549,12 +574,21 @@ acc_get_num_devices (acc_device_t d) gomp_mutex_unlock (&acc_device_lock); if (!acc_dev) - return 0; + goto out; n = acc_dev->get_num_devices_func (); if (n < 0) n = 0; + out: +#if 0 //TODO + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } +#endif + return n; } @@ -570,6 +604,14 @@ acc_set_device_type (acc_device_t d) struct gomp_device_descr *base_dev, *acc_dev; struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + prof_info.device_type = d; //TODO + gomp_mutex_lock (&acc_device_lock); if (!cached_base_dev) @@ -595,6 +637,12 @@ acc_set_device_type (acc_device_t d) } goacc_attach_host_thread_to_device (-1); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } ialias (acc_set_device_type) @@ -610,12 +658,25 @@ acc_get_device_type (void) res = acc_device_type (thr->base_dev->type); else { + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, + &prof_info, &api_info), + false); + gomp_init_targets_once (); gomp_mutex_lock (&acc_device_lock); dev = resolve_device (acc_device_default, true); gomp_mutex_unlock (&acc_device_lock); res = acc_device_type (dev->type); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } assert (res != acc_device_default @@ -632,6 +693,14 @@ acc_get_device_num (acc_device_t d) const struct gomp_device_descr *dev; struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + prof_info.device_type = d; //TODO + if (d >= _ACC_device_hwm) gomp_fatal ("unknown device type %u", (unsigned) d); @@ -642,6 +711,12 @@ acc_get_device_num (acc_device_t d) dev = resolve_device (d, true); gomp_mutex_unlock (&acc_device_lock); + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + if (thr && thr->base_dev == dev && thr->dev) return thr->dev->target_id; @@ -653,6 +728,19 @@ ialias (acc_get_device_num) void acc_set_device_num (int ord, acc_device_t d) { +#if 0 //TODO + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.device_type = d; //TODO + prof_info.device_type = ord; //TODO + } +#endif + struct gomp_device_descr *base_dev, *acc_dev; int num_devices; @@ -691,6 +779,14 @@ acc_set_device_num (int ord, acc_device_t d) } goacc_device_num = ord; + +#if 0 //TODO + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } +#endif } ialias (acc_set_device_num) diff --git libgomp/oacc-int.h libgomp/oacc-int.h index 8a62029..7f83516 100644 --- libgomp/oacc-int.h +++ libgomp/oacc-int.h @@ -110,7 +110,9 @@ void goacc_lazy_initialize (void); void goacc_host_init (void); void goacc_profiling_initialize (void); -bool goacc_profiling_dispatch_p (void); +bool goacc_profiling_setup_p (struct goacc_thread *, + acc_prof_info *, acc_api_info *); +bool goacc_profiling_dispatch_p (bool); void goacc_profiling_dispatch (acc_prof_info *, acc_event_info *, acc_api_info *); diff --git libgomp/oacc-mem.c libgomp/oacc-mem.c index 17e02b2..fd0dac4 100644 --- libgomp/oacc-mem.c +++ libgomp/oacc-mem.c @@ -103,12 +103,30 @@ acc_malloc (size_t s) struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + assert (thr->dev); + void *ret; if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) - return malloc (s); + { + /* TODO: Should we also generate acc_ev_alloc here? */ + ret = malloc (s); + } + else + ret = thr->dev->alloc_func (thr->dev->target_id, s); - return thr->dev->alloc_func (thr->dev->target_id, s); + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + + return ret; } /* OpenACC 2.0a (3.2.16) doesn't specify what to do in the event @@ -124,12 +142,23 @@ acc_free (void *d) struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + assert (thr && thr->dev); struct gomp_device_descr *acc_dev = thr->dev; if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) - return free (d); + { + /* TODO: Should we also generate acc_ev_free here? */ + free (d); + + goto out; + } gomp_mutex_lock (&acc_dev->lock); @@ -151,6 +180,13 @@ acc_free (void *d) if (!acc_dev->free_func (acc_dev->target_id, d)) gomp_fatal ("error in freeing device memory in %s", __FUNCTION__); + + out: + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } static void @@ -161,15 +197,31 @@ memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async, been obtained from a routine that did that. */ struct goacc_thread *thr = goacc_thread (); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + assert (thr && thr->dev); if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) { + /* TODO: Should we also generate + acc_ev_enqueue_upload_start/acc_ev_enqueue_upload_end or + acc_ev_enqueue_download_start/acc_ev_enqueue_download_end here? */ if (from) memmove (h, d, s); else memmove (d, h, s); - return; + + goto out; } if (async > acc_async_sync) @@ -184,6 +236,13 @@ memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async, if (!ret) gomp_fatal ("error in %s", libfnname); + + out: + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } void @@ -228,6 +287,9 @@ acc_deviceptr (void *h) if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return h; + /* In the following, no OpenACC Profiling Interface events can possibly be + generated. */ + gomp_mutex_lock (&dev->lock); n = lookup_host (dev, h, 1); @@ -265,6 +327,9 @@ acc_hostptr (void *d) if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return d; + /* In the following, no OpenACC Profiling Interface events can possibly be + generated. */ + gomp_mutex_lock (&acc_dev->lock); n = lookup_dev (acc_dev->openacc.data_environ, d, 1); @@ -302,6 +367,9 @@ acc_is_present (void *h, size_t s) if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return h != NULL; + /* In the following, no OpenACC Profiling Interface events can possibly be + generated. */ + gomp_mutex_lock (&acc_dev->lock); n = lookup_host (acc_dev, h, s); @@ -333,6 +401,12 @@ acc_map_data (void *h, void *d, size_t s) struct goacc_thread *thr = goacc_thread (); struct gomp_device_descr *acc_dev = thr->dev; + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) { if (d != h) @@ -372,6 +446,12 @@ acc_map_data (void *h, void *d, size_t s) tgt->prev = acc_dev->openacc.data_environ; acc_dev->openacc.data_environ = tgt; gomp_mutex_unlock (&acc_dev->lock); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } void @@ -386,6 +466,12 @@ acc_unmap_data (void *h) if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return; + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + size_t host_size; gomp_mutex_lock (&acc_dev->lock); @@ -436,6 +522,12 @@ acc_unmap_data (void *h) gomp_mutex_unlock (&acc_dev->lock); gomp_unmap_vars (t, true); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } #define FLAG_PRESENT (1 << 0) @@ -459,6 +551,18 @@ present_create_copy (unsigned f, void *h, size_t s, int async) if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return h; + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + gomp_mutex_lock (&acc_dev->lock); n = lookup_host (acc_dev, h, s); @@ -518,6 +622,12 @@ present_create_copy (unsigned f, void *h, size_t s, int async) gomp_mutex_unlock (&acc_dev->lock); } + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } + return d; } @@ -582,6 +692,18 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname) if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return; + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + gomp_mutex_lock (&acc_dev->lock); n = lookup_host (acc_dev, h, s); @@ -622,6 +744,12 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname) if (!acc_dev->free_func (acc_dev->target_id, d)) gomp_fatal ("error in freeing device memory in %s", libfnname); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } void @@ -664,6 +792,18 @@ update_dev_host (int is_dev, void *h, size_t s, int async) gomp_mutex_lock (&acc_dev->lock); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; //TODO + /* See . */ + prof_info.async_queue = prof_info.async; + } + n = lookup_host (acc_dev, h, s); if (!n) @@ -687,6 +827,12 @@ update_dev_host (int is_dev, void *h, size_t s, int async) acc_dev->openacc.async_set_async_func (acc_async_sync); gomp_mutex_unlock (&acc_dev->lock); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } void diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c index de70ac0..bff62ba 100644 --- libgomp/oacc-parallel.c +++ libgomp/oacc-parallel.c @@ -143,7 +143,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *), acc_dev = thr->dev; bool profiling_dispatch_p - = __builtin_expect (goacc_profiling_dispatch_p (), false); + = __builtin_expect (goacc_profiling_dispatch_p (true), false); acc_prof_info prof_info; if (profiling_dispatch_p) @@ -407,18 +407,86 @@ GOACC_data_start (int device, size_t mapnum, struct goacc_thread *thr = goacc_thread (); struct gomp_device_descr *acc_dev = thr->dev; + bool profiling_dispatch_p + = __builtin_expect (goacc_profiling_dispatch_p (true), false); + + acc_prof_info prof_info; + if (profiling_dispatch_p) + { + thr->prof_info = &prof_info; + + prof_info.event_type = acc_ev_enter_data_start; + prof_info.valid_bytes = _ACC_PROF_INFO_VALID_BYTES; + prof_info.version = _ACC_PROF_INFO_VERSION; + prof_info.device_type = acc_device_type (acc_dev->type); + prof_info.device_number = acc_dev->target_id; + prof_info.thread_id = -1; //TODO + prof_info.async = acc_async_sync; /* Always synchronous. */ + /* See . */ + prof_info.async_queue = prof_info.async; + prof_info.src_file = NULL; //TODO + prof_info.func_name = NULL; //TODO + prof_info.line_no = -1; //TODO + prof_info.end_line_no = -1; //TODO + prof_info.func_line_no = -1; //TODO + prof_info.func_end_line_no = -1; //TODO + } + acc_event_info enter_data_event_info; + if (profiling_dispatch_p) + { + enter_data_event_info.other_event.event_type + = prof_info.event_type; + enter_data_event_info.other_event.valid_bytes + = _ACC_OTHER_EVENT_INFO_VALID_BYTES; + enter_data_event_info.other_event.parent_construct = acc_construct_data; + for (int i = 0; i < mapnum; ++i) + if (kinds[i] == GOMP_MAP_USE_DEVICE_PTR) + { + /* If there is one such data mapping kind, then this is actually an + OpenACC host_data construct. (GCC maps the OpenACC host_data + construct to the OpenACC data construct.) Apart from artificial + test cases (such as an OpenACC host_data construct's (implicit) + device initialization when there hasn't been any device data be + set up before...), there can't really any meaningful events be + generated from OpenACC host_data constructs, though. */ + enter_data_event_info.other_event.parent_construct + = acc_construct_host_data; + break; + } + enter_data_event_info.other_event.implicit = 0; + enter_data_event_info.other_event.tool_info = NULL; + } + acc_api_info api_info; + if (profiling_dispatch_p) + { + thr->api_info = &api_info; + + api_info.device_api = acc_device_api_none; + api_info.valid_bytes = _ACC_API_INFO_VALID_BYTES; + api_info.device_type = prof_info.device_type; + api_info.vendor = -1; //TODO + api_info.device_handle = NULL; //TODO + api_info.context_handle = NULL; //TODO + api_info.async_handle = NULL; //TODO + } + + if (profiling_dispatch_p) + goacc_profiling_dispatch (&prof_info, &enter_data_event_info, &api_info); + handle_ftn_pointers (mapnum, hostaddrs, sizes, kinds); /* Host fallback or 'do nothing'. */ if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) || host_fallback) { + //TODO + prof_info.device_type = acc_device_host; + api_info.device_type = prof_info.device_type; tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, GOMP_MAP_VARS_OPENACC); tgt->prev = thr->mapped_data; thr->mapped_data = tgt; - - return; + goto out; } gomp_debug (0, " %s: prepare mappings\n", __FUNCTION__); @@ -427,18 +495,92 @@ GOACC_data_start (int device, size_t mapnum, gomp_debug (0, " %s: mappings prepared\n", __FUNCTION__); tgt->prev = thr->mapped_data; thr->mapped_data = tgt; + + out: + if (profiling_dispatch_p) + { + prof_info.event_type = acc_ev_enter_data_end; + enter_data_event_info.other_event.event_type = prof_info.event_type; + goacc_profiling_dispatch (&prof_info, &enter_data_event_info, &api_info); + + thr->prof_info = NULL; + thr->api_info = NULL; + } } void GOACC_data_end (void) { struct goacc_thread *thr = goacc_thread (); + struct gomp_device_descr *acc_dev = thr->dev; struct target_mem_desc *tgt = thr->mapped_data; + bool profiling_dispatch_p + = __builtin_expect (goacc_profiling_dispatch_p (true), false); + + acc_prof_info prof_info; + if (profiling_dispatch_p) + { + thr->prof_info = &prof_info; + + prof_info.event_type = acc_ev_exit_data_start; + prof_info.valid_bytes = _ACC_PROF_INFO_VALID_BYTES; + prof_info.version = _ACC_PROF_INFO_VERSION; + prof_info.device_type = acc_device_type (acc_dev->type); + prof_info.device_number = acc_dev->target_id; + prof_info.thread_id = -1; //TODO + prof_info.async = acc_async_sync; /* Always synchronous. */ + /* See . */ + prof_info.async_queue = prof_info.async; + prof_info.src_file = NULL; //TODO + prof_info.func_name = NULL; //TODO + prof_info.line_no = -1; //TODO + prof_info.end_line_no = -1; //TODO + prof_info.func_line_no = -1; //TODO + prof_info.func_end_line_no = -1; //TODO + } + acc_event_info exit_data_event_info; + if (profiling_dispatch_p) + { + exit_data_event_info.other_event.event_type + = prof_info.event_type; + exit_data_event_info.other_event.valid_bytes + = _ACC_OTHER_EVENT_INFO_VALID_BYTES; + exit_data_event_info.other_event.parent_construct = acc_construct_data; + exit_data_event_info.other_event.implicit = 0; + exit_data_event_info.other_event.tool_info = NULL; + } + acc_api_info api_info; + if (profiling_dispatch_p) + { + thr->api_info = &api_info; + + api_info.device_api = acc_device_api_none; + api_info.valid_bytes = _ACC_API_INFO_VALID_BYTES; + api_info.device_type = prof_info.device_type; + api_info.vendor = -1; //TODO + api_info.device_handle = NULL; //TODO + api_info.context_handle = NULL; //TODO + api_info.async_handle = NULL; //TODO + } + + if (profiling_dispatch_p) + goacc_profiling_dispatch (&prof_info, &exit_data_event_info, &api_info); + gomp_debug (0, " %s: restore mappings\n", __FUNCTION__); thr->mapped_data = tgt->prev; gomp_unmap_vars (tgt, true); gomp_debug (0, " %s: mappings restored\n", __FUNCTION__); + + if (profiling_dispatch_p) + { + prof_info.event_type = acc_ev_exit_data_end; + exit_data_event_info.other_event.event_type = prof_info.event_type; + goacc_profiling_dispatch (&prof_info, &exit_data_event_info, &api_info); + + thr->prof_info = NULL; + thr->api_info = NULL; + } } void @@ -452,26 +594,6 @@ GOACC_enter_exit_data (int device, size_t mapnum, bool data_enter = false; size_t i; - goacc_lazy_initialize (); - - thr = goacc_thread (); - acc_dev = thr->dev; - - if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) - || host_fallback) - return; - - if (num_waits) - { - va_list ap; - - va_start (ap, num_waits); - goacc_wait (async, num_waits, &ap); - va_end (ap); - } - - acc_dev->openacc.async_set_async_func (async); - /* Determine if this is an "acc enter data". */ for (i = 0; i < mapnum; ++i) { @@ -501,6 +623,86 @@ GOACC_enter_exit_data (int device, size_t mapnum, kind); } + goacc_lazy_initialize (); + + thr = goacc_thread (); + acc_dev = thr->dev; + + bool profiling_dispatch_p + = __builtin_expect (goacc_profiling_dispatch_p (true), false); + + acc_prof_info prof_info; + if (profiling_dispatch_p) + { + thr->prof_info = &prof_info; + + prof_info.event_type + = data_enter ? acc_ev_enter_data_start : acc_ev_exit_data_start; + prof_info.valid_bytes = _ACC_PROF_INFO_VALID_BYTES; + prof_info.version = _ACC_PROF_INFO_VERSION; + prof_info.device_type = acc_device_type (acc_dev->type); + prof_info.device_number = acc_dev->target_id; + prof_info.thread_id = -1; //TODO + prof_info.async = async; + /* See . */ + prof_info.async_queue = prof_info.async; + prof_info.src_file = NULL; //TODO + prof_info.func_name = NULL; //TODO + prof_info.line_no = -1; //TODO + prof_info.end_line_no = -1; //TODO + prof_info.func_line_no = -1; //TODO + prof_info.func_end_line_no = -1; //TODO + } + acc_event_info enter_exit_data_event_info; + if (profiling_dispatch_p) + { + enter_exit_data_event_info.other_event.event_type + = prof_info.event_type; + enter_exit_data_event_info.other_event.valid_bytes + = _ACC_OTHER_EVENT_INFO_VALID_BYTES; + enter_exit_data_event_info.other_event.parent_construct + = data_enter ? acc_construct_enter_data : acc_construct_exit_data; + enter_exit_data_event_info.other_event.implicit = 0; + enter_exit_data_event_info.other_event.tool_info = NULL; + } + acc_api_info api_info; + if (profiling_dispatch_p) + { + thr->api_info = &api_info; + + api_info.device_api = acc_device_api_none; + api_info.valid_bytes = _ACC_API_INFO_VALID_BYTES; + api_info.device_type = prof_info.device_type; + api_info.vendor = -1; //TODO + api_info.device_handle = NULL; //TODO + api_info.context_handle = NULL; //TODO + api_info.async_handle = NULL; //TODO + } + + if (profiling_dispatch_p) + goacc_profiling_dispatch (&prof_info, &enter_exit_data_event_info, + &api_info); + + if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + || host_fallback) + { + //TODO + prof_info.device_type = acc_device_host; + api_info.device_type = prof_info.device_type; + goto out; + } + + if (num_waits) + { + va_list ap; + + va_start (ap, num_waits); + goacc_wait (async, num_waits, &ap); + va_end (ap); + } + + acc_dev->openacc.async_set_async_func (async); + /* In c, non-pointers and arrays are represented by a single data clause. Dynamically allocated arrays and subarrays are represented by a data clause followed by an internal GOMP_MAP_POINTER. @@ -603,6 +805,18 @@ GOACC_enter_exit_data (int device, size_t mapnum, } acc_dev->openacc.async_set_async_func (acc_async_sync); + + out: + if (profiling_dispatch_p) + { + prof_info.event_type = data_enter ? acc_ev_enter_data_end: acc_ev_exit_data_end; + enter_exit_data_event_info.other_event.event_type = prof_info.event_type; + goacc_profiling_dispatch (&prof_info, &enter_exit_data_event_info, + &api_info); + + thr->prof_info = NULL; + thr->api_info = NULL; + } } static void @@ -642,9 +856,66 @@ GOACC_update (int device, size_t mapnum, struct goacc_thread *thr = goacc_thread (); struct gomp_device_descr *acc_dev = thr->dev; + bool profiling_dispatch_p + = __builtin_expect (goacc_profiling_dispatch_p (true), false); + + acc_prof_info prof_info; + if (profiling_dispatch_p) + { + thr->prof_info = &prof_info; + + prof_info.event_type = acc_ev_update_start; + prof_info.valid_bytes = _ACC_PROF_INFO_VALID_BYTES; + prof_info.version = _ACC_PROF_INFO_VERSION; + prof_info.device_type = acc_device_type (acc_dev->type); + prof_info.device_number = acc_dev->target_id; + prof_info.thread_id = -1; //TODO + prof_info.async = async; + /* See . */ + prof_info.async_queue = prof_info.async; + prof_info.src_file = NULL; //TODO + prof_info.func_name = NULL; //TODO + prof_info.line_no = -1; //TODO + prof_info.end_line_no = -1; //TODO + prof_info.func_line_no = -1; //TODO + prof_info.func_end_line_no = -1; //TODO + } + acc_event_info update_event_info; + if (profiling_dispatch_p) + { + update_event_info.other_event.event_type + = prof_info.event_type; + update_event_info.other_event.valid_bytes + = _ACC_OTHER_EVENT_INFO_VALID_BYTES; + update_event_info.other_event.parent_construct = acc_construct_update; + update_event_info.other_event.implicit = 0; + update_event_info.other_event.tool_info = NULL; + } + acc_api_info api_info; + if (profiling_dispatch_p) + { + thr->api_info = &api_info; + + api_info.device_api = acc_device_api_none; + api_info.valid_bytes = _ACC_API_INFO_VALID_BYTES; + api_info.device_type = prof_info.device_type; + api_info.vendor = -1; //TODO + api_info.device_handle = NULL; //TODO + api_info.context_handle = NULL; //TODO + api_info.async_handle = NULL; //TODO + } + + if (profiling_dispatch_p) + goacc_profiling_dispatch (&prof_info, &update_event_info, &api_info); + if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) || host_fallback) - return; + { + //TODO + prof_info.device_type = acc_device_host; + api_info.device_type = prof_info.device_type; + goto out; + } if (num_waits) { @@ -718,11 +989,41 @@ GOACC_update (int device, size_t mapnum, } acc_dev->openacc.async_set_async_func (acc_async_sync); + + out: + if (profiling_dispatch_p) + { + prof_info.event_type = acc_ev_update_end; + update_event_info.other_event.event_type = prof_info.event_type; + goacc_profiling_dispatch (&prof_info, &update_event_info, &api_info); + + thr->prof_info = NULL; + thr->api_info = NULL; + } } void GOACC_wait (int async, int num_waits, ...) { + goacc_lazy_initialize (); + + struct goacc_thread *thr = goacc_thread (); + + /* No nesting. */ + assert (thr->prof_info == NULL); + assert (thr->api_info == NULL); + acc_prof_info prof_info; + acc_api_info api_info; + bool profiling_setup_p + = __builtin_expect (goacc_profiling_setup_p (thr, &prof_info, &api_info), + false); + if (profiling_setup_p) + { + prof_info.async = async; + /* See . */ + prof_info.async_queue = prof_info.async; + } + if (num_waits) { va_list ap; @@ -734,7 +1035,13 @@ GOACC_wait (int async, int num_waits, ...) else if (async == acc_async_sync) acc_wait_all (); else if (async == acc_async_noval) - goacc_thread ()->dev->openacc.async_wait_all_async_func (acc_async_noval); + thr->dev->openacc.async_wait_all_async_func (acc_async_noval); + + if (profiling_setup_p) + { + thr->prof_info = NULL; + thr->api_info = NULL; + } } int diff --git libgomp/oacc-profiling.c libgomp/oacc-profiling.c index a4671f9..35d652c 100644 --- libgomp/oacc-profiling.c +++ libgomp/oacc-profiling.c @@ -485,10 +485,90 @@ acc_prof_unregister (acc_event_t ev, acc_prof_callback cb, acc_register_t reg) gomp_mutex_unlock (&goacc_prof_lock); } +/* Set up to dispatch events? */ + +bool +goacc_profiling_setup_p (struct goacc_thread *thr, + acc_prof_info *prof_info, acc_api_info *api_info) +{ + //TODO + gomp_debug (0, "%s (%p)\n", __FUNCTION__, thr); + + /* If we don't have any per-thread state yet, we can't register prof_info and + api_info. */ + /* TODO: In this case, should we actually call goacc_lazy_initialize here, + and return the "thr" from goacc_profiling_setup_p? */ + if (__builtin_expect (thr == NULL, false)) + { + //TODO + gomp_debug (0, "Can't generate OpenACC Profiling Interface events for" + " the current call, construct, or directive\n"); + return false; + } + + bool profiling_dispatch_p + = __builtin_expect (goacc_profiling_dispatch_p (false), false); + if (thr->prof_info != NULL) + { + assert (profiling_dispatch_p); //TODO + /* Profiling has already been set up for an outer construct. In this + case, we continue to use the existing information, and thus return + "false" here. + + This can happen, for example, for an enter data directive, which sets + up profiling, then calls into acc_copyin, which should not again set + up profiling, should not overwrite the existing information. */ + //TODO: Is this all kosher? + return false; + } + + if (profiling_dispatch_p) + { + thr->prof_info = prof_info; + + prof_info->event_type = -1; /* Must be set later. */ + prof_info->valid_bytes = _ACC_PROF_INFO_VALID_BYTES; + prof_info->version = _ACC_PROF_INFO_VERSION; + //TODO + if (thr->dev) + { + prof_info->device_type = acc_device_type (thr->dev->type); + prof_info->device_number = thr->dev->target_id; + } + else + { + prof_info->device_type = -1; + prof_info->device_number = -1; + } + prof_info->thread_id = -1; //TODO + prof_info->async = acc_async_sync; //TODO + /* See . */ + prof_info->async_queue = prof_info->async; + prof_info->src_file = NULL; //TODO + prof_info->func_name = NULL; //TODO + prof_info->line_no = -1; //TODO + prof_info->end_line_no = -1; //TODO + prof_info->func_line_no = -1; //TODO + prof_info->func_end_line_no = -1; //TODO + + thr->api_info = api_info; + + api_info->device_api = acc_device_api_none; //TODO + api_info->valid_bytes = _ACC_API_INFO_VALID_BYTES; + api_info->device_type = prof_info->device_type; + api_info->vendor = -1; //TODO + api_info->device_handle = NULL; //TODO + api_info->context_handle = NULL; //TODO + api_info->async_handle = NULL; //TODO + } + + return profiling_dispatch_p; +} + /* Prepare to dispatch events? */ bool -goacc_profiling_dispatch_p (void) +goacc_profiling_dispatch_p (bool check_not_nested_p) { //TODO gomp_debug (0, "%s\n", __FUNCTION__); @@ -504,11 +584,21 @@ goacc_profiling_dispatch_p (void) //TODO gomp_debug (0, " %s: don't have any per-thread state yet\n", __FUNCTION__); } - else if (__builtin_expect (!thr->prof_callbacks_enabled, true)) + else { - //TODO - gomp_debug (0, " %s: disabled for this thread\n", __FUNCTION__); - return false; + if (check_not_nested_p) + { + /* No nesting. */ + assert (thr->prof_info == NULL); + assert (thr->api_info == NULL); + } + + if (__builtin_expect (!thr->prof_callbacks_enabled, true)) + { + //TODO + gomp_debug (0, " %s: disabled for this thread\n", __FUNCTION__); + return false; + } } gomp_mutex_lock (&goacc_prof_lock); diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c index dbea9da..a9d1f16 100644 --- libgomp/plugin/plugin-nvptx.c +++ libgomp/plugin/plugin-nvptx.c @@ -1275,10 +1275,38 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs, api_info); } + acc_event_info wait_event_info; + if (profiling_dispatch_p) + { + prof_info->event_type = acc_ev_wait_start; + + wait_event_info.other_event.event_type = prof_info->event_type; + wait_event_info.other_event.valid_bytes + = _ACC_OTHER_EVENT_INFO_VALID_BYTES; + wait_event_info.other_event.parent_construct + /* TODO = compute_construct_event_info.other_event.parent_construct */ + = acc_construct_parallel; //TODO: kernels... + wait_event_info.other_event.implicit = 1; + wait_event_info.other_event.tool_info = NULL; + + api_info->device_api = acc_device_api_cuda; + } #ifndef DISABLE_ASYNC if (async < acc_async_noval) { + if (profiling_dispatch_p) + { + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &wait_event_info, + api_info); + } r = cuStreamSynchronize (dev_str->stream); + if (profiling_dispatch_p) + { + prof_info->event_type = acc_ev_wait_end; + wait_event_info.other_event.event_type = prof_info->event_type; + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &wait_event_info, + api_info); + } if (r == CUDA_ERROR_LAUNCH_FAILED) GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s %s\n", cuda_error (r), maybe_abort_msg); @@ -1305,7 +1333,19 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs, event_add (PTX_EVT_KNL, e, (void *)dev_str, 0); } #else + if (profiling_dispatch_p) + { + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &wait_event_info, + api_info); + } r = cuCtxSynchronize (); + if (profiling_dispatch_p) + { + prof_info->event_type = acc_ev_wait_end; + wait_event_info.other_event.event_type = prof_info->event_type; + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &wait_event_info, + api_info); + } if (r == CUDA_ERROR_LAUNCH_FAILED) GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s %s\n", cuda_error (r), maybe_abort_msg); @@ -1664,7 +1704,44 @@ nvptx_wait (int async) GOMP_PLUGIN_debug (0, " %s: waiting on async=%d\n", __FUNCTION__, async); + struct goacc_thread *thr = GOMP_PLUGIN_goacc_thread (); + bool profiling_dispatch_p + = __builtin_expect (thr != NULL && thr->prof_info != NULL, false); + acc_event_info wait_event_info; + if (profiling_dispatch_p) + { + acc_prof_info *prof_info = thr->prof_info; + acc_api_info *api_info = thr->api_info; + + prof_info->event_type = acc_ev_wait_start; + + wait_event_info.other_event.event_type = prof_info->event_type; + wait_event_info.other_event.valid_bytes + = _ACC_OTHER_EVENT_INFO_VALID_BYTES; + wait_event_info.other_event.parent_construct + /* TODO = compute_construct_event_info.other_event.parent_construct */ + = acc_construct_parallel; //TODO: kernels... + wait_event_info.other_event.implicit = 1; + wait_event_info.other_event.tool_info = NULL; + + api_info->device_api = acc_device_api_cuda; + + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &wait_event_info, + api_info); + } CUDA_CALL_ASSERT (cuStreamSynchronize, s->stream); + if (profiling_dispatch_p) + { + acc_prof_info *prof_info = thr->prof_info; + acc_api_info *api_info = thr->api_info; + + prof_info->event_type = acc_ev_wait_end; + + wait_event_info.other_event.event_type = prof_info->event_type; + + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &wait_event_info, + api_info); + } event_gc (true); } @@ -1706,10 +1783,28 @@ nvptx_wait_all (void) CUresult r; struct ptx_stream *s; pthread_t self = pthread_self (); - struct nvptx_thread *nvthd = nvptx_thread (); + struct goacc_thread *thr = GOMP_PLUGIN_goacc_thread (); + struct nvptx_thread *nvthd = (struct nvptx_thread *) thr->target_tls; pthread_mutex_lock (&nvthd->ptx_dev->stream_lock); + acc_prof_info *prof_info = thr->prof_info; + acc_event_info wait_event_info; + acc_api_info *api_info = thr->api_info; + bool profiling_dispatch_p = __builtin_expect (prof_info != NULL, false); + if (profiling_dispatch_p) + { + wait_event_info.other_event.valid_bytes + = _ACC_OTHER_EVENT_INFO_VALID_BYTES; + wait_event_info.other_event.parent_construct + /* TODO = compute_construct_event_info.other_event.parent_construct */ + = acc_construct_parallel; //TODO: kernels... + wait_event_info.other_event.implicit = 1; + wait_event_info.other_event.tool_info = NULL; + + api_info->device_api = acc_device_api_cuda; + } + /* Wait for active streams initiated by this thread (or by multiple threads) to complete. */ for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next) @@ -1722,7 +1817,23 @@ nvptx_wait_all (void) else if (r != CUDA_ERROR_NOT_READY) GOMP_PLUGIN_fatal ("cuStreamQuery error: %s", cuda_error (r)); + if (profiling_dispatch_p) + { + prof_info->event_type = acc_ev_wait_start; + wait_event_info.other_event.event_type = prof_info->event_type; + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, + &wait_event_info, + api_info); + } CUDA_CALL_ASSERT (cuStreamSynchronize, s->stream); + if (profiling_dispatch_p) + { + prof_info->event_type = acc_ev_wait_end; + wait_event_info.other_event.event_type = prof_info->event_type; + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, + &wait_event_info, + api_info); + } } }