From patchwork Thu Nov 12 11:16:21 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 543310 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 9F5CD14016A for ; Thu, 12 Nov 2015 22:16:53 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=cqWrMpAu; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type; q=dns; s=default; b=px79RAsR0J7+4hqQ JIkkfAZrbNNX2ccay/iLbMpEtyfwM7hq90Q6+BGsBm2BNjV8kRUaR55hZiUT0Kwj wymKWNAHQwgK9naI0q9hmswn6NQ2QFiBRIs+YOmC76NmFQx18gL9sVo2xLPQuTXQ lJ4iTCI6mphzhBMX1J7nHTe32xY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type; s=default; bh=nDDsgn7Ghdt2UCImhkrhIv tGDP0=; b=cqWrMpAuOEgPW9ijvIFcCKngt3AGQwXP1eXNzZIBjoEUxoIstXrZwP jFwGJyavb1EtzR24y9eSfEL96bG1FzhewEKyYcwXXd2v7ajkYrOjrK+6rFSYHkqS z47QAK29mIJJ9TYzE0YgfLpMaTubFScraAOO7F7geC1SSLiDXo5qA= Received: (qmail 111770 invoked by alias); 12 Nov 2015 11:16:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 111753 invoked by uid 89); 12 Nov 2015 11:16:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_50, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Nov 2015 11:16:36 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1Zwprj-00010l-BB from Julian_Brown@mentor.com ; Thu, 12 Nov 2015 03:16:32 -0800 Received: from octopus (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Thu, 12 Nov 2015 11:16:29 +0000 Date: Thu, 12 Nov 2015 11:16:21 +0000 From: Julian Brown To: Jakub Jelinek CC: James Norris , GCC Patches , "Joseph S. Myers" , Nathan Sidwell Subject: Re: [OpenACC 0/7] host_data construct Message-ID: <20151112111621.657650bc@octopus> In-Reply-To: <20151102183339.365c3d33@octopus> References: <56293476.5020801@codesourcery.com> <562A578E.4080907@codesourcery.com> <20151026183422.GW478@tucnak.redhat.com> <20151102183339.365c3d33@octopus> MIME-Version: 1.0 X-IsSubscribed: yes On Mon, 2 Nov 2015 18:33:39 +0000 Julian Brown wrote: > On Mon, 26 Oct 2015 19:34:22 +0100 > Jakub Jelinek wrote: > > > Your use_device sounds very similar to use_device_ptr clause in > > OpenMP, which is allowed on #pragma omp target data construct and is > > implemented quite a bit differently from this; it is unclear if the > > OpenACC standard requires this kind of implementation, or you just > > chose to implement it this way. In particular, the GOMP_target_data > > call puts the variables mentioned in the use_device_ptr clauses into > > the mapping structures (similarly how map clause appears) and the > > corresponding vars are privatized within the target data region > > (which is a host region, basically a fancy { } braces), where the > > private variables contain the offloading device's pointers. > > As the author of the original patch, I have to say using the mapping > structures seems like a far better approach, but I've hit some trouble > with the details of adapting OpenACC to use that method. Here's a version of the patch which (hopefully) brings OpenACC on par with OpenMP with respect to use_device/use_device_ptr variables. The implementation is essentially the same now for OpenACC as for OpenMP (i.e. using mapping structures): so for now, only array or pointer variables can be used as use_device variables. The included tests have been adjusted accordingly. One awkward part of the implementation concerns nesting offloaded regions within host_data regions: #define N 1024 int main (int argc, char* argv[]) { int x[N]; #pragma acc data copyin (x[0:N]) { int *xp; #pragma acc host_data use_device (x) { [...] #pragma acc parallel present (x) copyout (xp) { xp = x; } } assert (xp == acc_deviceptr (x)); } return 0; } I think the meaning of 'x' as seen within the clauses of the parallel directive should be the *host* version of x, not the mapped target address (I've asked on the OpenACC technical mailing list to clarify this point, but no reply as yet). The changes to {maybe_,}lookup_decl_in_outer_ctx "skip over" host_data contexts when called from lower_omp_target. There's probably an analogous case for OpenMP, but I've not tried to handle that. No regressions for libgomp tests, and the new tests pass. OK for trunk? Thanks, Julian ChangeLog Julian Brown Cesar Philippidis James Norris gcc/ * c-family/c-pragma.c (oacc_pragmas): Add PRAGMA_OACC_HOST_DATA. * c-family/c-pragma.h (pragma_kind): Add PRAGMA_OACC_HOST_DATA. (pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_USE_DEVICE. * c/c-parser.c (c_parser_omp_clause_name): Add use_device support. (c_parser_oacc_clause_use_device): New function. (c_parser_oacc_all_clauses): Add use_device support. (OACC_HOST_DATA_CLAUSE_MASK): New macro. (c_parser_oacc_host_data): New function. (c_parser_omp_construct): Add host_data support. * c/c-tree.h (c_finish_oacc_host_data): Add prototype. * c/c-typeck.c (c_finish_oacc_host_data): New function. (c_finish_omp_clauses): Add use_device support. * cp/cp-tree.h (finish_oacc_host_data): Add prototype. * cp/parser.c (cp_parser_omp_clause_name): Add use_device support. (cp_parser_oacc_all_clauses): Add use_device support. (OACC_HOST_DATA_CLAUSE_MASK): New macro. (cp_parser_oacc_host_data): New function. (cp_parser_omp_construct): Add host_data support. (cp_parser_pragma): Add host_data support. * cp/semantics.c (finish_omp_clauses): Add use_device support. (finish_oacc_host_data): New function. * gimple-pretty-print.c (dump_gimple_omp_target): Add host_data support. * gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_HOST_DATA. (is_gimple_omp_oacc): Add support for above. * gimplify.c (gimplify_scan_omp_clauses): Add host_data, use_device support. (gimplify_omp_workshare): Add host_data support. (gimplify_expr): Likewise. * omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): New. * omp-low.c (lookup_decl_in_outer_ctx) (maybe_lookup_decl_in_outer_ctx): Add optional argument to skip host_data regions. (scan_sharing_clauses): Support use_device. (check_omp_nesting_restrictions): Support host_data. (expand_omp_target): Support host_data. (lower_omp_target): Skip over outer host_data regions when looking up decls. Support use_device. (make_gimple_omp_edges): Support host_data. * tree-nested.c (convert_nonlocal_omp_clauses): Add use_device clause. libgomp/ * oacc-parallel.c (GOACC_host_data): New function. * libgomp.map (GOACC_host_data): Add to GOACC_2.0.1. * testsuite/libgomp.oacc-c-c++-common/host_data-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/host_data-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/host_data-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/host_data-4.c: New test. * testsuite/libgomp.oacc-c-c++-common/host_data-5.c: New test. commit ac4269627c5b3f5d5c20fab7517c066ae6dfce74 Author: Julian Brown Date: Mon Nov 2 06:31:47 2015 -0800 OpenACC host_data support using mapping regions. diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c index f86ed38..3b30191 100644 --- a/gcc/c-family/c-pragma.c +++ b/gcc/c-family/c-pragma.c @@ -1250,6 +1250,7 @@ static const struct omp_pragma_def oacc_pragmas[] = { { "data", PRAGMA_OACC_DATA }, { "enter", PRAGMA_OACC_ENTER_DATA }, { "exit", PRAGMA_OACC_EXIT_DATA }, + { "host_data", PRAGMA_OACC_HOST_DATA }, { "kernels", PRAGMA_OACC_KERNELS }, { "loop", PRAGMA_OACC_LOOP }, { "parallel", PRAGMA_OACC_PARALLEL }, diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index afeceff..2ad7356 100644 --- a/gcc/c-family/c-pragma.h +++ b/gcc/c-family/c-pragma.h @@ -32,6 +32,7 @@ enum pragma_kind { PRAGMA_OACC_DATA, PRAGMA_OACC_ENTER_DATA, PRAGMA_OACC_EXIT_DATA, + PRAGMA_OACC_HOST_DATA, PRAGMA_OACC_KERNELS, PRAGMA_OACC_LOOP, PRAGMA_OACC_PARALLEL, @@ -165,6 +166,7 @@ enum pragma_omp_clause { PRAGMA_OACC_CLAUSE_SELF, PRAGMA_OACC_CLAUSE_SEQ, PRAGMA_OACC_CLAUSE_TILE, + PRAGMA_OACC_CLAUSE_USE_DEVICE, PRAGMA_OACC_CLAUSE_VECTOR, PRAGMA_OACC_CLAUSE_VECTOR_LENGTH, PRAGMA_OACC_CLAUSE_WAIT, diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 2484b92..8b048a3 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -10139,6 +10139,8 @@ c_parser_omp_clause_name (c_parser *parser) result = PRAGMA_OMP_CLAUSE_UNTIED; else if (!strcmp ("use_device_ptr", p)) result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR; + else if (!strcmp ("use_device", p)) + result = PRAGMA_OACC_CLAUSE_USE_DEVICE; break; case 'v': if (!strcmp ("vector", p)) @@ -11485,6 +11487,15 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list) return c; } +/* OpenACC 2.0: + use_device ( variable-list ) */ + +static tree +c_parser_oacc_clause_use_device (c_parser *parser, tree list) +{ + return c_parser_omp_var_list_parens (parser, OMP_CLAUSE_USE_DEVICE, list); +} + /* OpenACC: wait ( int-expr-list ) */ @@ -12786,6 +12797,10 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask, clauses = c_parser_oacc_data_clause (parser, c_kind, clauses); c_name = "self"; break; + case PRAGMA_OACC_CLAUSE_USE_DEVICE: + clauses = c_parser_oacc_clause_use_device (parser, clauses); + c_name = "use_device"; + break; case PRAGMA_OACC_CLAUSE_SEQ: clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ, clauses); @@ -13280,6 +13295,29 @@ c_parser_oacc_enter_exit_data (c_parser *parser, bool enter) /* OpenACC 2.0: + # pragma acc host_data oacc-data-clause[optseq] new-line + structured-block +*/ + +#define OACC_HOST_DATA_CLAUSE_MASK \ + ( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_USE_DEVICE) ) + +static tree +c_parser_oacc_host_data (location_t loc, c_parser *parser) +{ + tree stmt, clauses, block; + + clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK, + "#pragma acc host_data"); + + block = c_begin_omp_parallel (); + add_stmt (c_parser_omp_structured_block (parser)); + stmt = c_finish_oacc_host_data (loc, clauses, block); + return stmt; +} + + +/* OpenACC 2.0: # pragma acc loop oacc-loop-clause[optseq] new-line structured-block @@ -16573,6 +16611,9 @@ c_parser_omp_construct (c_parser *parser) case PRAGMA_OACC_DATA: stmt = c_parser_oacc_data (loc, parser); break; + case PRAGMA_OACC_HOST_DATA: + stmt = c_parser_oacc_host_data (loc, parser); + break; case PRAGMA_OACC_KERNELS: case PRAGMA_OACC_PARALLEL: strcpy (p_name, "#pragma acc"); diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h index 04991f7..f332661 100644 --- a/gcc/c/c-tree.h +++ b/gcc/c/c-tree.h @@ -642,6 +642,7 @@ extern tree c_finish_goto_ptr (location_t, tree); extern tree c_expr_to_decl (tree, bool *, bool *); extern tree c_finish_omp_construct (location_t, enum tree_code, tree, tree); extern tree c_finish_oacc_data (location_t, tree, tree); +extern tree c_finish_oacc_host_data (location_t, tree, tree); extern tree c_begin_omp_parallel (void); extern tree c_finish_omp_parallel (location_t, tree, tree); extern tree c_begin_omp_task (void); diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 4335a87..12edfba 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -11541,6 +11541,25 @@ c_finish_oacc_data (location_t loc, tree clauses, tree block) return add_stmt (stmt); } +/* Generate OACC_HOST_DATA, with CLAUSES and BLOCK as its compound + statement. LOC is the location of the OACC_HOST_DATA. */ + +tree +c_finish_oacc_host_data (location_t loc, tree clauses, tree block) +{ + tree stmt; + + block = c_end_compound_stmt (loc, block, true); + + stmt = make_node (OACC_HOST_DATA); + TREE_TYPE (stmt) = void_type_node; + OACC_HOST_DATA_CLAUSES (stmt) = clauses; + OACC_HOST_DATA_BODY (stmt) = block; + SET_EXPR_LOCATION (stmt, loc); + + return add_stmt (stmt); +} + /* Like c_begin_compound_stmt, except force the retention of the BLOCK. */ tree @@ -12981,6 +13000,7 @@ c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd) bitmap_set_bit (&map_head, DECL_UID (t)); goto check_dup_generic; + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_IS_DEVICE_PTR: case OMP_CLAUSE_USE_DEVICE_PTR: t = OMP_CLAUSE_DECL (c); diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 828f268..11bd663 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -6317,6 +6317,7 @@ extern void finish_omp_threadprivate (tree); extern tree begin_omp_structured_block (void); extern tree finish_omp_structured_block (tree); extern tree finish_oacc_data (tree, tree); +extern tree finish_oacc_host_data (tree, tree); extern tree finish_omp_construct (enum tree_code, tree, tree); extern tree begin_omp_parallel (void); extern tree finish_omp_parallel (tree, tree); diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index a87675e..20c19b1 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -29262,6 +29262,8 @@ cp_parser_omp_clause_name (cp_parser *parser) result = PRAGMA_OMP_CLAUSE_UNTIED; else if (!strcmp ("use_device_ptr", p)) result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR; + else if (!strcmp ("use_device", p)) + result = PRAGMA_OACC_CLAUSE_USE_DEVICE; break; case 'v': if (!strcmp ("vector", p)) @@ -31614,6 +31616,11 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask, clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses); c_name = "self"; break; + case PRAGMA_OACC_CLAUSE_USE_DEVICE: + clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_USE_DEVICE, + clauses); + c_name = "use_device"; + break; case PRAGMA_OACC_CLAUSE_SEQ: clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ, clauses, here); @@ -34525,6 +34532,30 @@ cp_parser_oacc_data (cp_parser *parser, cp_token *pragma_tok) return stmt; } +#define OACC_HOST_DATA_CLAUSE_MASK \ + ( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_USE_DEVICE) ) + +/* OpenACC 2.0: + # pragma acc host_data new-line + structured-block */ + +static tree +cp_parser_oacc_host_data (cp_parser *parser, cp_token *pragma_tok) +{ + tree stmt, clauses, block; + unsigned int save; + + clauses = cp_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK, + "#pragma acc host_data", pragma_tok); + + block = begin_omp_parallel (); + save = cp_parser_begin_omp_structured_block (parser); + cp_parser_statement (parser, NULL_TREE, false, NULL); + cp_parser_end_omp_structured_block (parser, save); + stmt = finish_oacc_host_data (clauses, block); + return stmt; +} + /* OpenACC 2.0: # pragma acc enter data oacc-enter-data-clause[optseq] new-line @@ -35789,6 +35820,9 @@ cp_parser_omp_construct (cp_parser *parser, cp_token *pragma_tok) case PRAGMA_OACC_EXIT_DATA: stmt = cp_parser_oacc_enter_exit_data (parser, pragma_tok, false); break; + case PRAGMA_OACC_HOST_DATA: + stmt = cp_parser_oacc_host_data (parser, pragma_tok); + break; case PRAGMA_OACC_KERNELS: case PRAGMA_OACC_PARALLEL: strcpy (p_name, "#pragma acc"); @@ -36363,6 +36397,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context) case PRAGMA_OACC_DATA: case PRAGMA_OACC_ENTER_DATA: case PRAGMA_OACC_EXIT_DATA: + case PRAGMA_OACC_HOST_DATA: case PRAGMA_OACC_KERNELS: case PRAGMA_OACC_PARALLEL: case PRAGMA_OACC_LOOP: diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index db37e85..36a1b25 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -6810,6 +6810,7 @@ finish_omp_clauses (tree clauses, bool allow_fields, bool declare_simd) } break; + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_IS_DEVICE_PTR: case OMP_CLAUSE_USE_DEVICE_PTR: field_ok = allow_fields; @@ -7365,6 +7366,24 @@ finish_oacc_data (tree clauses, tree block) return add_stmt (stmt); } +/* Generate OACC_HOST_DATA, with CLAUSES and BLOCK as its compound + statement. */ + +tree +finish_oacc_host_data (tree clauses, tree block) +{ + tree stmt; + + block = finish_omp_structured_block (block); + + stmt = make_node (OACC_HOST_DATA); + TREE_TYPE (stmt) = void_type_node; + OACC_HOST_DATA_CLAUSES (stmt) = clauses; + OACC_HOST_DATA_BODY (stmt) = block; + + return add_stmt (stmt); +} + /* Generate OMP construct CODE, with BODY and CLAUSES as its compound statement. */ diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index 7b50cdf..c148c3c 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -1353,6 +1353,9 @@ dump_gimple_omp_target (pretty_printer *buffer, gomp_target *gs, case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA: kind = " oacc_enter_exit_data"; break; + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: + kind = " oacc_host_data"; + break; default: gcc_unreachable (); } diff --git a/gcc/gimple.h b/gcc/gimple.h index 781801b..c88da95 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -170,6 +170,7 @@ enum gf_mask { GF_OMP_TARGET_KIND_OACC_DATA = 7, GF_OMP_TARGET_KIND_OACC_UPDATE = 8, GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9, + GF_OMP_TARGET_KIND_OACC_HOST_DATA = 10, /* True on an GIMPLE_OMP_RETURN statement if the return does not require a thread synchronization via some sort of barrier. The exact barrier @@ -6004,6 +6005,7 @@ is_gimple_omp_oacc (const gimple *stmt) case GF_OMP_TARGET_KIND_OACC_DATA: case GF_OMP_TARGET_KIND_OACC_UPDATE: case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA: + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: return true; default: return false; diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 66e5168..1259061 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -6294,6 +6294,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_TARGET_DATA: case OMP_TARGET_ENTER_DATA: case OMP_TARGET_EXIT_DATA: + case OACC_HOST_DATA: ctx->target_firstprivatize_array_bases = true; default: break; @@ -6559,6 +6560,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_TARGET_DATA: case OMP_TARGET_ENTER_DATA: case OMP_TARGET_EXIT_DATA: + case OACC_HOST_DATA: if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER || (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_REFERENCE)) @@ -6968,6 +6970,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, } goto do_notice; + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: flags = GOVD_FIRSTPRIVATE | GOVD_EXPLICIT; goto do_add; @@ -7203,7 +7206,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, break; case OMP_CLAUSE_DEVICE_RESIDENT: - case OMP_CLAUSE_USE_DEVICE: remove = true; break; @@ -8961,6 +8963,9 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) case OMP_TEAMS: ort = OMP_TEAMS_COMBINED (expr) ? ORT_COMBINED_TEAMS : ORT_TEAMS; break; + case OACC_HOST_DATA: + ort = ORT_TARGET_DATA; + break; default: gcc_unreachable (); } @@ -8982,6 +8987,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) switch (TREE_CODE (expr)) { case OACC_DATA: + case OACC_HOST_DATA: end_ix = BUILT_IN_GOACC_DATA_END; break; case OMP_TARGET_DATA: @@ -9013,6 +9019,10 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_OACC_KERNELS, OMP_CLAUSES (expr)); break; + case OACC_HOST_DATA: + stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_OACC_HOST_DATA, + OMP_CLAUSES (expr)); + break; case OACC_PARALLEL: stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_OACC_PARALLEL, OMP_CLAUSES (expr)); @@ -10122,12 +10132,12 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, ret = GS_ALL_DONE; break; - case OACC_HOST_DATA: case OACC_DECLARE: sorry ("directive not yet implemented"); ret = GS_ALL_DONE; break; + case OACC_HOST_DATA: case OACC_DATA: case OACC_KERNELS: case OACC_PARALLEL: diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index 0b6bd58..109d374 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -47,6 +47,8 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update", DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait", BT_FN_VOID_INT_INT_VAR, ATTR_NOTHROW_LIST) +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_HOST_DATA, "GOACC_host_data", + BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 51b471c..0bb993f 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -390,8 +390,8 @@ scan_omp_op (tree *tp, omp_context *ctx) } static void lower_omp (gimple_seq *, omp_context *); -static tree lookup_decl_in_outer_ctx (tree, omp_context *); -static tree maybe_lookup_decl_in_outer_ctx (tree, omp_context *); +static tree lookup_decl_in_outer_ctx (tree, omp_context *, bool = false); +static tree maybe_lookup_decl_in_outer_ctx (tree, omp_context *, bool = false); /* Find an OMP clause of type KIND within CLAUSES. */ @@ -1935,6 +1935,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) install_var_local (decl, ctx); break; + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: decl = OMP_CLAUSE_DECL (c); if (TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE) @@ -2134,7 +2135,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) break; case OMP_CLAUSE_DEVICE_RESIDENT: - case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE__CACHE_: case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: @@ -2288,6 +2288,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_SIMD: case OMP_CLAUSE_NOGROUP: case OMP_CLAUSE_DEFAULTMAP: + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: case OMP_CLAUSE__CILK_FOR_COUNT_: case OMP_CLAUSE_ASYNC: @@ -2302,7 +2303,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) break; case OMP_CLAUSE_DEVICE_RESIDENT: - case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE__CACHE_: case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: @@ -3608,6 +3608,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) case GF_OMP_TARGET_KIND_OACC_UPDATE: stmt_name = "update"; break; case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA: stmt_name = "enter/exit data"; break; + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: stmt_name = "host_data"; + break; default: gcc_unreachable (); } switch (gimple_omp_target_kind (ctx->stmt)) @@ -3619,6 +3621,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) case GF_OMP_TARGET_KIND_OACC_KERNELS: ctx_stmt_name = "kernels"; break; case GF_OMP_TARGET_KIND_OACC_DATA: ctx_stmt_name = "data"; break; + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: + ctx_stmt_name = "host_data"; break; default: gcc_unreachable (); } @@ -3941,13 +3945,22 @@ maybe_lookup_ctx (gimple *stmt) parallelism happens only rarely. */ static tree -lookup_decl_in_outer_ctx (tree decl, omp_context *ctx) +lookup_decl_in_outer_ctx (tree decl, omp_context *ctx, + bool skip_hostdata) { tree t; omp_context *up; for (up = ctx->outer, t = NULL; up && t == NULL; up = up->outer) - t = maybe_lookup_decl (decl, up); + { + if (skip_hostdata + && gimple_code (up->stmt) == GIMPLE_OMP_TARGET + && gimple_omp_target_kind (up->stmt) + == GF_OMP_TARGET_KIND_OACC_HOST_DATA) + continue; + + t = maybe_lookup_decl (decl, up); + } gcc_assert (!ctx->is_nested || t || is_global_var (decl)); @@ -3959,13 +3972,22 @@ lookup_decl_in_outer_ctx (tree decl, omp_context *ctx) in outer contexts. */ static tree -maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx) +maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx, + bool skip_hostdata) { tree t = NULL; omp_context *up; for (up = ctx->outer, t = NULL; up && t == NULL; up = up->outer) - t = maybe_lookup_decl (decl, up); + { + if (skip_hostdata + && gimple_code (up->stmt) == GIMPLE_OMP_TARGET + && gimple_omp_target_kind (up->stmt) + == GF_OMP_TARGET_KIND_OACC_HOST_DATA) + continue; + + t = maybe_lookup_decl (decl, up); + } return t ? t : decl; } @@ -12458,6 +12480,7 @@ expand_omp_target (struct omp_region *region) break; case GF_OMP_TARGET_KIND_DATA: case GF_OMP_TARGET_KIND_OACC_DATA: + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: data_region = true; break; default: @@ -12697,6 +12720,9 @@ expand_omp_target (struct omp_region *region) case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA: start_ix = BUILT_IN_GOACC_ENTER_EXIT_DATA; break; + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: + start_ix = BUILT_IN_GOACC_HOST_DATA; + break; default: gcc_unreachable (); } @@ -12820,6 +12846,7 @@ expand_omp_target (struct omp_region *region) { case BUILT_IN_GOACC_DATA_START: case BUILT_IN_GOMP_TARGET_DATA: + case BUILT_IN_GOACC_HOST_DATA: break; case BUILT_IN_GOMP_TARGET: case BUILT_IN_GOMP_TARGET_UPDATE: @@ -13127,6 +13154,7 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent, case GF_OMP_TARGET_KIND_OACC_PARALLEL: case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_DATA: + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: break; case GF_OMP_TARGET_KIND_UPDATE: case GF_OMP_TARGET_KIND_ENTER_DATA: @@ -14920,6 +14948,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) break; case GF_OMP_TARGET_KIND_DATA: case GF_OMP_TARGET_KIND_OACC_DATA: + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: data_region = true; break; default: @@ -15025,7 +15054,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) { if (TREE_CODE (TREE_TYPE (var)) == ARRAY_TYPE) { - if (is_global_var (maybe_lookup_decl_in_outer_ctx (var, ctx)) + if (is_global_var (maybe_lookup_decl_in_outer_ctx (var, ctx, + true)) && varpool_node::get_create (var)->offloadable) continue; @@ -15124,6 +15154,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) } break; + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: case OMP_CLAUSE_IS_DEVICE_PTR: var = OMP_CLAUSE_DECL (c); @@ -15262,7 +15293,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) talign = DECL_ALIGN_UNIT (ovar); if (nc) { - var = lookup_decl_in_outer_ctx (ovar, ctx); + var = lookup_decl_in_outer_ctx (ovar, ctx, true); x = build_sender_ref (ovar, ctx); if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP @@ -15509,12 +15540,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) build_int_cstu (tkind_type, tkind)); break; + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: case OMP_CLAUSE_IS_DEVICE_PTR: ovar = OMP_CLAUSE_DECL (c); var = lookup_decl_in_outer_ctx (ovar, ctx); x = build_sender_ref (ovar, ctx); - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR + || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE) tkind = GOMP_MAP_USE_DEVICE_PTR; else tkind = GOMP_MAP_FIRSTPRIVATE_INT; @@ -15717,10 +15750,12 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) gimple_build_assign (new_var, x)); } break; + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: case OMP_CLAUSE_IS_DEVICE_PTR: var = OMP_CLAUSE_DECL (c); - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR + || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE) x = build_sender_ref (var, ctx); else x = build_receiver_ref (var, false, ctx); @@ -16707,6 +16742,7 @@ make_gimple_omp_edges (basic_block bb, struct omp_region **region, case GF_OMP_TARGET_KIND_OACC_PARALLEL: case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_DATA: + case GF_OMP_TARGET_KIND_OACC_HOST_DATA: break; case GF_OMP_TARGET_KIND_UPDATE: case GF_OMP_TARGET_KIND_ENTER_DATA: diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c index 1f6311c..7579cb6 100644 --- a/gcc/tree-nested.c +++ b/gcc/tree-nested.c @@ -1072,6 +1072,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) case OMP_CLAUSE_SHARED: case OMP_CLAUSE_TO_DECLARE: case OMP_CLAUSE_LINK: + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: case OMP_CLAUSE_IS_DEVICE_PTR: do_decl_clause: @@ -1719,6 +1720,7 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) case OMP_CLAUSE_SHARED: case OMP_CLAUSE_TO_DECLARE: case OMP_CLAUSE_LINK: + case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_USE_DEVICE_PTR: case OMP_CLAUSE_IS_DEVICE_PTR: do_decl_clause: diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 39faba9..2e6561e 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -393,6 +393,7 @@ GOACC_2.0 { GOACC_2.0.1 { global: GOACC_parallel_keyed; + GOACC_host_data; } GOACC_2.0; GOMP_PLUGIN_1.0 { diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c index 525846b..f261dce 100644 --- a/libgomp/oacc-parallel.c +++ b/libgomp/oacc-parallel.c @@ -490,6 +490,46 @@ GOACC_wait (int async, int num_waits, ...) goacc_thread ()->dev->openacc.async_wait_all_async_func (acc_async_noval); } +void +GOACC_host_data (int device, size_t mapnum, + void **hostaddrs, size_t *sizes, unsigned short *kinds) +{ + bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK; + struct target_mem_desc *tgt; + +#ifdef HAVE_INTTYPES_H + gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n", + __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds); +#else + gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n", + __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds); +#endif + + goacc_lazy_initialize (); + + struct goacc_thread *thr = goacc_thread (); + struct gomp_device_descr *acc_dev = thr->dev; + + /* Host fallback or 'do nothing'. */ + if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + || host_fallback) + { + tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, + GOMP_MAP_VARS_OPENACC); + tgt->prev = thr->mapped_data; + thr->mapped_data = tgt; + + return; + } + + gomp_debug (0, " %s: prepare mappings\n", __FUNCTION__); + tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true, + GOMP_MAP_VARS_OPENACC); + gomp_debug (0, " %s: mappings prepared\n", __FUNCTION__); + tgt->prev = thr->mapped_data; + thr->mapped_data = tgt; +} + int GOACC_get_num_threads (void) { diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c new file mode 100644 index 0000000..8dc7c2d --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c @@ -0,0 +1,118 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ +/* { dg-additional-options "-lcuda -lcublas -lcudart" } */ + +#include +#include +#include +#include +#include + +void +saxpy_host (int n, float a, float *x, float *y) +{ + int i; + + for (i = 0; i < n; i++) + y[i] = y[i] + a * x[i]; +} + +#pragma acc routine +void +saxpy_target (int n, float a, float *x, float *y) +{ + int i; + + for (i = 0; i < n; i++) + y[i] = y[i] + a * x[i]; +} + +int +main(int argc, char **argv) +{ +#define N 8 + int i; + float x_ref[N], y_ref[N]; + float x[N], y[N]; + cublasHandle_t h; + float a = 2.0; + +#pragma acc data copyin (x[0:N]) copy (y[0:N]) + { + float *xp, *yp; +#pragma acc host_data use_device (x, y) + { +#pragma acc parallel pcopy (xp, yp) + { + xp = x; + yp = y; + } + } + + if (xp != acc_deviceptr (x) || yp != acc_deviceptr (y)) + abort (); + } + + for (i = 0; i < N; i++) + { + x[i] = x_ref[i] = 4.0 + i; + y[i] = y_ref[i] = 3.0; + } + + saxpy_host (N, a, x_ref, y_ref); + + cublasCreate (&h); + +#pragma acc data copyin (x[0:N]) copy (y[0:N]) + { +#pragma acc host_data use_device (x, y) + { + cublasSaxpy (h, N, &a, x, 1, y, 1); + } + } + + for (i = 0; i < N; i++) + { + if (y[i] != y_ref[i]) + abort (); + } + +#pragma acc data create (x[0:N]) copyout (y[0:N]) + { +#pragma acc kernels + for (i = 0; i < N; i++) + y[i] = 3.0; + +#pragma acc host_data use_device (x, y) + { + cublasSaxpy (h, N, &a, x, 1, y, 1); + } + } + + cublasDestroy (h); + + for (i = 0; i < N; i++) + { + if (y[i] != y_ref[i]) + abort (); + } + + for (i = 0; i < N; i++) + y[i] = 3.0; + +#pragma acc data copyin (x[0:N]) copyin (a) copy (y[0:N]) + { +#pragma acc host_data use_device (x, y) + { +#pragma acc parallel present (x[0:N]) pcopy (y[0:N]) present (a) + saxpy_target (N, a, x, y); + } + } + + for (i = 0; i < N; i++) + { + if (y[i] != y_ref[i]) + abort (); + } + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-2.c new file mode 100644 index 0000000..614f143 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-2.c @@ -0,0 +1,31 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ + +#include +#include + +char *global_in_host; + +void foo (char *in) +{ + if (!acc_is_present (global_in_host, sizeof (*global_in_host)) + || in != acc_deviceptr (global_in_host)) + abort (); +} + +int +main (int argc, char **argv) +{ + char mydata[1024]; + + global_in_host = mydata; + +#pragma acc data copyin(mydata) + { +#pragma acc host_data use_device (mydata) + { + foo (mydata); + } + } + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-3.c new file mode 100644 index 0000000..942a01d --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-3.c @@ -0,0 +1,28 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ + +#include +#include + +#define N 1024 + +int main (int argc, char* argv[]) +{ + int x[N]; + +#pragma acc data copyin (x[0:N]) + { + int *xp; +#pragma acc host_data use_device (x) + { +#pragma acc parallel present (x) copyout (xp) + { + xp = x; + } + } + + if (xp != acc_deviceptr (x)) + abort (); + } + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-4.c new file mode 100644 index 0000000..f53fc90 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-4.c @@ -0,0 +1,29 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ + +#include +#include + +#define N 1024 + +int main (int argc, char* argv[]) +{ + int x[N], *xp2; + +#pragma acc data copyin (x[0:N]) + { + int *xp; +#pragma acc host_data use_device (x) + { +#pragma acc data present (x) + { + xp = x; + } + xp2 = x; + } + + if (xp != acc_deviceptr (x) || xp2 != xp) + abort (); + } + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-5.c new file mode 100644 index 0000000..82c84a6 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-5.c @@ -0,0 +1,38 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ + +#include +#include + +#define N 1024 + +int main (int argc, char* argv[]) +{ + int x[N], y[N], *yp; + + yp = y + 1; + +#pragma acc data copyin (x[0:N]) + { + int *xp, *yp2; +#pragma acc host_data use_device (x) + { +#pragma acc data present (x) copyin (y) + { +#pragma acc host_data use_device (yp) + { + xp = x; + yp2 = yp; + } + + if (yp2 != acc_deviceptr (yp)) + abort (); + } + } + + if (xp != acc_deviceptr (x)) + abort (); + + } + + return 0; +}