Message ID | 87fvptqm41.fsf@kepler.schwinge.homeip.net |
---|---|
State | New |
Headers | show |
On Mon, Dec 16, 2013 at 08:41:02AM +0100, Thomas Schwinge wrote:
> Here is the patch I propose for gomp-4_0-branch; OK?
No. The reason for 3 separate arrays is that some of the values
are always variable, some are sometimes variable (sizes), some are
never variable (alignment + kind).
So, if anything, the change would be to make the last array ushort instead of
uchar. Plus, the change, being an ABI change, would need to be done on the
trunk rather than just on gomp-4_0-branch.
But, I don't have time right now to read the OpenACC standard and am not
convinced whether it is actually desirable to use the same library
entrypoint for OpenACC when it clearly isn't a 1:1 match in behavior.
You'll need extra code to implement the forceful mapping/unmapping even when
something is already mapped (OpenMP doesn't have that), also from your
example it wasn't clear if GOMP_target_data/GOMP_target_data_end map to
anything in OpenACC (it looked like you have separate enter and exit
directives for the region rather than one with structured block, and you can
supply clauses on either of them).
Jakub
Hi! On Mon, 16 Dec 2013 16:38:18 +0100, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Dec 16, 2013 at 08:41:02AM +0100, Thomas Schwinge wrote: > > Here is the patch I propose for gomp-4_0-branch; OK? > > No. The reason for 3 separate arrays is that some of the values > are always variable, some are sometimes variable (sizes), some are > never variable (alignment + kind). > So, if anything, the change would be to make the last array ushort instead of > uchar. Sure, that's fine by me, and thanks for the explanation. > Plus, the change, being an ABI change, would need to be done on the > trunk rather than just on gomp-4_0-branch. It's not an ABI change, as I had limited the change to the GOACC_parallel function, which does not yet exist in trunk. I can certainly prepare this ABI change for trunk if you'd like this seen changed for OpenMP, too. (Which I had proposed earlier, but not received comments on, so I went on with the solution I posted.) > But, I don't have time right now to read the OpenACC standard and am not > convinced whether it is actually desirable to use the same library > entrypoint for OpenACC when it clearly isn't a 1:1 match in behavior. We're not using the same entry points. We have GOACC_parallel, and will be adding more. Internally in libgomp, GOACC_parallel currently invokes the existing GOMP_target, because that is close enough to get something to work, even if not particularly complying with the OpenACC semantics (as we discussed), but that's only a stop-gap solution. Sorry if that wasn't obvious. At the GNU Tools Cauldron, I've been asked (at least by Jeff Law, maybe others, too) that we send incremental patches instead of just »big blobs of code after having worked on it for months« (approximation of Jeff's words), so that's what I'm trying. > You'll need extra code to implement the forceful mapping/unmapping even when > something is already mapped (OpenMP doesn't have that), also from your > example it wasn't clear if GOMP_target_data/GOMP_target_data_end map to > anything in OpenACC (it looked like you have separate enter and exit > directives for the region rather than one with structured block, and you can > supply clauses on either of them). Right, in OpenACC, there are several possibilities of setting up and retiring memory mapping regions. We're preparing patches for extending the existing libgomp memory mapping code to also satisfy the OpenACC requirements additionally to the existing OpenMP ones. At this point, and as we add support for additional OpenACC constructs and data clauses, we'll then be able to use the libgomp code with the OpenACC semantics, without duplicating the existing code implementing the OpenMP semantics. Grüße, Thomas
On Mon, Dec 16, 2013 at 05:35:36PM +0100, Thomas Schwinge wrote: > Sure, that's fine by me, and thanks for the explanation. > > > Plus, the change, being an ABI change, would need to be done on the > > trunk rather than just on gomp-4_0-branch. > > It's not an ABI change, as I had limited the change to the GOACC_parallel > function, which does not yet exist in trunk. Ah, ok then, I've missed that. I'd indeed prefer if you just used one array, it can be say just uchar array of twice the width, with even chars for alignment and odd for kinds (or vice versa), compared to two arrays it is tiny bit cheaper at the caller side IMHO. Jakub
Hi! This one's owed to me still learning about GCC internals; if someone could please be so kind to poit me to the appropriate documentation, or explain: On Mon, 16 Dec 2013 16:38:18 +0100, Jakub Jelinek <jakub@redhat.com> wrote: > The reason for 3 separate arrays is that some of the values > are always variable, some are sometimes variable (sizes), some are > never variable (alignment + kind). Related to this, in gcc/omp-low.c:lower_omp_target, I see: tree clobber = build_constructor (ctx->record_type, NULL); TREE_THIS_VOLATILE (clobber) = 1; gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl, clobber)); I'm assuming the point of this clobber is to tell the compiler (because it can't figure this out on its own?) that afterwards, after the gimple_seq olist has been "executed", we're not going to use the ctx->sender_decl object anymore, right? What would happen if this clobber were not added? Missed optimizations due to the object being kept alive, or correctness issues, or something else? And, why doesn't the same also need to be done for the sizes object (in the non-static case)? Grüße, Thomas
On Wed, Dec 18, 2013 at 09:03:40PM +0100, Thomas Schwinge wrote: > This one's owed to me still learning about GCC internals; if someone > could please be so kind to poit me to the appropriate documentation, or > explain: > > On Mon, 16 Dec 2013 16:38:18 +0100, Jakub Jelinek <jakub@redhat.com> wrote: > > The reason for 3 separate arrays is that some of the values > > are always variable, some are sometimes variable (sizes), some are > > never variable (alignment + kind). > > Related to this, in gcc/omp-low.c:lower_omp_target, I see: > > tree clobber = build_constructor (ctx->record_type, NULL); > TREE_THIS_VOLATILE (clobber) = 1; > gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl, > clobber)); Clobber stmt is an artificial statement that tells various optimization passes that the decl is dead at that point, so e.g. DSE can remove stores to the decl only followed by the clobber, or cfgexpand automatic variable layout code can be able to better share stack slots for variables that aren't live concurrently. It is purely optimization thing right. Given that the address of the object is passed to some other function, it might help the compiler to find out that the function doesn't remember that address somewhere, making the object live for longer than it is. Jakub
diff --git gcc/builtin-types.def gcc/builtin-types.def index e7bfaf9..59660cd 100644 --- gcc/builtin-types.def +++ gcc/builtin-types.def @@ -529,6 +529,9 @@ DEF_FUNCTION_TYPE_8 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR, BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG, BT_BOOL, BT_UINT, BT_PTR) +DEF_FUNCTION_TYPE_8 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR, + BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE, + BT_PTR, BT_PTR, BT_PTR, BT_PTR) DEF_FUNCTION_TYPE_VAR_0 (BT_FN_VOID_VAR, BT_VOID) DEF_FUNCTION_TYPE_VAR_0 (BT_FN_INT_VAR, BT_INT) diff --git gcc/fortran/types.def gcc/fortran/types.def index 9bbee35..9ec752a 100644 --- gcc/fortran/types.def +++ gcc/fortran/types.def @@ -213,5 +213,8 @@ DEF_FUNCTION_TYPE_8 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR, BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG, BT_BOOL, BT_UINT, BT_PTR) +DEF_FUNCTION_TYPE_8 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR, + BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE, + BT_PTR, BT_PTR, BT_PTR, BT_PTR) DEF_FUNCTION_TYPE_VAR_0 (BT_FN_VOID_VAR, BT_VOID) diff --git gcc/oacc-builtins.def gcc/oacc-builtins.def index a75e42d..5057e13 100644 --- gcc/oacc-builtins.def +++ gcc/oacc-builtins.def @@ -28,4 +28,5 @@ along with GCC; see the file COPYING3. If not see See builtins.def for details. */ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel", - BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) + BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR, + ATTR_NOTHROW_LIST) diff --git gcc/omp-low.c gcc/omp-low.c index e0f7d1d..ce99835 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -4886,7 +4886,7 @@ expand_oacc_parallel (struct omp_region *region) } /* Emit a library call to launch CHILD_FN. */ - tree t1, t2, t3, t4, device, c, clauses; + tree t1, t2, t3, t4, t5, device, c, clauses; enum built_in_function start_ix; location_t clause_loc; @@ -4918,6 +4918,7 @@ expand_oacc_parallel (struct omp_region *region) t2 = build_zero_cst (ptr_type_node); t3 = t2; t4 = t2; + t5 = t2; } else { @@ -4926,6 +4927,7 @@ expand_oacc_parallel (struct omp_region *region) t2 = build_fold_addr_expr (TREE_VEC_ELT (t, 0)); t3 = build_fold_addr_expr (TREE_VEC_ELT (t, 1)); t4 = build_fold_addr_expr (TREE_VEC_ELT (t, 2)); + t5 = build_fold_addr_expr (TREE_VEC_ELT (t, 3)); } gimple g; @@ -4935,7 +4937,7 @@ expand_oacc_parallel (struct omp_region *region) tree openmp_target = build_zero_cst (ptr_type_node); tree fnaddr = build_fold_addr_expr (child_fn); g = gimple_build_call (builtin_decl_explicit (start_ix), - 7, device, fnaddr, openmp_target, t1, t2, t3, t4); + 8, device, fnaddr, openmp_target, t1, t2, t3, t4, t5); gimple_set_location (g, gimple_location (entry_stmt)); gsi_insert_before (&gsi, g, GSI_SAME_STMT); } @@ -8766,7 +8768,7 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, omp_context *ctx) = create_tmp_var (ctx->record_type, ".omp_data_arr"); DECL_NAMELESS (ctx->sender_decl) = 1; TREE_ADDRESSABLE (ctx->sender_decl) = 1; - t = make_tree_vec (3); + t = make_tree_vec (4); TREE_VEC_ELT (t, 0) = ctx->sender_decl; TREE_VEC_ELT (t, 1) = create_tmp_var (build_array_type_nelts (size_type_node, map_cnt), @@ -8777,15 +8779,24 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, omp_context *ctx) TREE_VEC_ELT (t, 2) = create_tmp_var (build_array_type_nelts (unsigned_char_type_node, map_cnt), - ".omp_data_kinds"); + ".omp_data_alignments"); DECL_NAMELESS (TREE_VEC_ELT (t, 2)) = 1; TREE_ADDRESSABLE (TREE_VEC_ELT (t, 2)) = 1; TREE_STATIC (TREE_VEC_ELT (t, 2)) = 1; + TREE_VEC_ELT (t, 3) + = create_tmp_var (build_array_type_nelts (unsigned_char_type_node, + map_cnt), + ".omp_data_kinds"); + DECL_NAMELESS (TREE_VEC_ELT (t, 3)) = 1; + TREE_ADDRESSABLE (TREE_VEC_ELT (t, 3)) = 1; + TREE_STATIC (TREE_VEC_ELT (t, 3)) = 1; gimple_oacc_parallel_set_data_arg (stmt, t); vec<constructor_elt, va_gc> *vsize; + vec<constructor_elt, va_gc> *valign; vec<constructor_elt, va_gc> *vkind; vec_alloc (vsize, map_cnt); + vec_alloc (valign, map_cnt); vec_alloc (vkind, map_cnt); unsigned int map_idx = 0; @@ -8884,6 +8895,14 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, omp_context *ctx) if (TREE_CODE (s) != INTEGER_CST) TREE_STATIC (TREE_VEC_ELT (t, 1)) = 0; + unsigned int talign = TYPE_ALIGN_UNIT (TREE_TYPE (ovar)); + if (DECL_P (ovar) && DECL_ALIGN_UNIT (ovar) > talign) + talign = DECL_ALIGN_UNIT (ovar); + talign = ceil_log2 (talign); + CONSTRUCTOR_APPEND_ELT (valign, purpose, + build_int_cst (unsigned_char_type_node, + talign)); + unsigned char tkind = 0; switch (OMP_CLAUSE_CODE (c)) { @@ -8899,14 +8918,10 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, omp_context *ctx) default: gcc_unreachable (); } - unsigned int talign = TYPE_ALIGN_UNIT (TREE_TYPE (ovar)); - if (DECL_P (ovar) && DECL_ALIGN_UNIT (ovar) > talign) - talign = DECL_ALIGN_UNIT (ovar); - talign = ceil_log2 (talign); - tkind |= talign << 3; CONSTRUCTOR_APPEND_ELT (vkind, purpose, build_int_cst (unsigned_char_type_node, tkind)); + if (nc && nc != c) c = nc; } @@ -8916,7 +8931,9 @@ lower_oacc_parallel (gimple_stmt_iterator *gsi_p, omp_context *ctx) DECL_INITIAL (TREE_VEC_ELT (t, 1)) = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 1)), vsize); DECL_INITIAL (TREE_VEC_ELT (t, 2)) - = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)), vkind); + = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)), valign); + DECL_INITIAL (TREE_VEC_ELT (t, 3)) + = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 3)), vkind); if (!TREE_STATIC (TREE_VEC_ELT (t, 1))) { gimple_seq initlist = NULL; diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h index 394f3a8..06d7750 100644 --- libgomp/libgomp_g.h +++ libgomp/libgomp_g.h @@ -217,6 +217,7 @@ extern void GOMP_teams (unsigned int, unsigned int); /* oacc-parallel.c */ extern void GOACC_parallel (int, void (*) (void *), const void *, - size_t, void **, size_t *, unsigned char *); + size_t, void **, size_t *, unsigned char *, + unsigned char *); #endif /* LIBGOMP_G_H */ diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c index 730b83b..6cc04e1 100644 --- libgomp/oacc-parallel.c +++ libgomp/oacc-parallel.c @@ -25,12 +25,24 @@ /* This file handles the OpenACC parallel construct. */ +#include "libgomp.h" #include "libgomp_g.h" void GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target, size_t mapnum, void **hostaddrs, size_t *sizes, - unsigned char *kinds) + unsigned char *alignments, unsigned char *kinds) { + size_t i; + + for (i = 0; i < mapnum; ++i) + { + if (kinds[i] > 4) + gomp_fatal ("memory mapping kind %x for %zd is not yet supported", + kinds[i], i); + + kinds[i] |= alignments[i] << 3; + } + GOMP_target (device, fn, openmp_target, mapnum, hostaddrs, sizes, kinds); } diff --git libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c index b9bdffa..142c394 100644 --- libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c +++ libgomp/testsuite/libgomp.oacc-c/goacc_parallel.c @@ -17,7 +17,8 @@ f (void *data) int main(void) { i = -1; - GOACC_parallel (0, f, (const void *) 0, 0, (void *) 0, (void *) 0, (void *) 0); + GOACC_parallel (0, f, (const void *) 0, + 0, (void *) 0, (void *) 0, (void *) 0, (void *) 0); if (i != 42) abort ();