Message ID | 22cdd4ec-7d14-03b1-6789-54788a573bca@codesourcery.com |
---|---|
State | New |
Headers | show |
Series | [RFC] openmp: ensure variables in offload table are streamed out (PRs 94848 + 95551) (was: Re: [Patch][RFC] openmp: don't add artificial const decl to offload table (PRs 94848 + 95551)) | expand |
On Mon, Jun 08, 2020 at 09:36:29PM +0200, Tobias Burnus wrote: > I have now split-off the missed-optimization task to a new > PR, PR95583, to be handled in a proper way instead of trying > to cook-up a hackish special-case version. > > This patch now simply sets the force_output flag. > > (a) As output_offload_tables() (i.e. LTO streamout) > comes very early, one could just set the force_output flag > in this file without further checks or omp-offload.c changes > (b) Alternatively, one check that it really works by using > gcc_assert (symtab_node::get (it)); > in either or both files. > (c) or assuming that some optimization worked, one could use: > if (!symtab_node::get (it)) > continue; > > The patch does (c) as trimming it to (b) or (a) is trival. I prefer the patch as is, output_offload_tables() isn't actually that early, there are all the early optimizations before that. And if we don't optimize it early enough, perhaps we need a targeted unused target variable removal subpass (early ipa). > OK? What about backporting to GCC 10? Ok. Please wait a few days before backporting. Jakub
It turned out that this patch fails with LTO and partitions, causing fails at runtime such as libgomp: Duplicate node via libgomp/splay-tree.c's splay_tree_insert. In the test case, the problem occurred for functions - namely main._omp_fn.* on the host. If the code is run in LTO context, the filtering-out should have already happen via the stream-out/stream-in and hence no additional check is needed for omp_finish_file. OK? Tobias PS: The streaming-in is done via: input_offload_tables (/* do_force_output = */ !flag_ltrans); ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
On Tue, Jun 09, 2020 at 04:02:19PM +0200, Tobias Burnus wrote: > It turned out that this patch fails with LTO and partitions, > causing fails at runtime such as > libgomp: Duplicate node > via libgomp/splay-tree.c's splay_tree_insert. > > In the test case, the problem occurred for functions - namely > main._omp_fn.* on the host. > If the code is run in LTO context, the filtering-out should > have already happen via the stream-out/stream-in and hence no > additional check is needed for omp_finish_file. > > OK? Was this caught in the testsuite, or do you have some short testcase that could be used in the testsuite? > > Tobias > > PS: The streaming-in is done via: > input_offload_tables (/* do_force_output = */ !flag_ltrans); > > ----------------- > Mentor Graphics (Deutschland) GmbH, ArnulfstraÃe 201, 80634 München / Germany > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter > openmp: ensure variables in offload table are streamed out (PRs 94848 + 95551) > > gcc/ChangeLog: > > * omp-offload.c (add_decls_addresses_to_decl_constructor, > omp_finish_file): With in_lto_p, stream out all offload-table > items even if the symtab_node does not exist. Ok with or without the testcase. Jakub
Hi! On 2020-06-09T16:11:03+0200, Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > On Tue, Jun 09, 2020 at 04:02:19PM +0200, Tobias Burnus wrote: >> It turned out that this patch fails with LTO and partitions, >> causing fails at runtime such as >> libgomp: Duplicate node >> via libgomp/splay-tree.c's splay_tree_insert. >> >> In the test case, the problem occurred for functions - namely >> main._omp_fn.* on the host. >> If the code is run in LTO context, the filtering-out should >> have already happen via the stream-out/stream-in and hence no >> additional check is needed for omp_finish_file. > > Was this caught in the testsuite I saw it show up as: PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) [-PASS:-]{+WARNING: program timed out.+} {+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) [-PASS:-]{+WARNING: program timed out.+} {+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) [-PASS:-]{+WARNING: program timed out.+} {+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) [-PASS:-]{+WARNING: program timed out.+} {+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test Same for C++. Not sure if that constitutes sufficient testsuite coverage. > do you have some short testcase > that could be used in the testsuite? Can we do some LTO-compile-time tree scanning? Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
openmp: ensure variables in offload table are streamed out (PRs 94848 + 95551) gcc/ChangeLog: PR lto/94848 PR middle-end/95551 * omp-offload.c (add_decls_addresses_to_decl_constructor, omp_finish_file): Skip removed items. * lto-cgraph.c (output_offload_tables): Likewise; set force_output to this node for variables and functions. libgomp/ChangeLog: PR lto/94848 PR middle-end/95551 * testsuite/libgomp.fortran/target-var.f90: New test. gcc/lto-cgraph.c | 8 ++++++ gcc/omp-offload.c | 12 ++++++++- libgomp/testsuite/libgomp.fortran/target-var.f90 | 32 ++++++++++++++++++++++++ 3 files changed, 51 insertions(+), 1 deletion(-) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index a671c671fa7..93a99f3465b 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -1069,6 +1069,10 @@ output_offload_tables (void) for (unsigned i = 0; i < vec_safe_length (offload_funcs); i++) { + symtab_node *node = symtab_node::get ((*offload_funcs)[i]); + if (!node) + continue; + node->force_output = true; streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag, LTO_symtab_unavail_node); lto_output_fn_decl_ref (ob->decl_state, ob->main_stream, @@ -1077,6 +1081,10 @@ output_offload_tables (void) for (unsigned i = 0; i < vec_safe_length (offload_vars); i++) { + symtab_node *node = symtab_node::get ((*offload_vars)[i]); + if (!node) + continue; + node->force_output = true; streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag, LTO_symtab_variable); lto_output_var_decl_ref (ob->decl_state, ob->main_stream, diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index b2df91a5724..4e44cfc9d0a 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -125,6 +125,10 @@ add_decls_addresses_to_decl_constructor (vec<tree, va_gc> *v_decls, #endif && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (it)); + /* See also omp_finish_file and output_offload_tables in lto-cgraph.c. */ + if (!symtab_node::get (it)) + continue; + tree size = NULL_TREE; if (is_var) size = fold_convert (const_ptr_type_node, DECL_SIZE_UNIT (it)); @@ -341,7 +345,7 @@ omp_finish_file (void) add_decls_addresses_to_decl_constructor (offload_vars, v_v); tree vars_decl_type = build_array_type_nelts (pointer_sized_int_node, - num_vars * 2); + vec_safe_length (v_v)); tree funcs_decl_type = build_array_type_nelts (pointer_sized_int_node, num_funcs); SET_TYPE_ALIGN (vars_decl_type, TYPE_ALIGN (pointer_sized_int_node)); @@ -376,11 +380,17 @@ omp_finish_file (void) for (unsigned i = 0; i < num_funcs; i++) { tree it = (*offload_funcs)[i]; + /* See also add_decls_addresses_to_decl_constructor + and output_offload_tables in lto-cgraph.c. */ + if (!symtab_node::get (it)) + continue; targetm.record_offload_symbol (it); } for (unsigned i = 0; i < num_vars; i++) { tree it = (*offload_vars)[i]; + if (!symtab_node::get (it)) + continue; #ifdef ACCEL_COMPILER if (DECL_HAS_VALUE_EXPR_P (it) && lookup_attribute ("omp declare target link", diff --git a/libgomp/testsuite/libgomp.fortran/target-var.f90 b/libgomp/testsuite/libgomp.fortran/target-var.f90 new file mode 100644 index 00000000000..5e5ccd47c96 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-var.f90 @@ -0,0 +1,32 @@ +! { dg-additional-options "-O3" } +! +! With -O3 the static local variable A.10 generated for +! the array constructor [-2, -4, ..., -20] is optimized +! away - which has to be handled in the offload_vars table. +! +program main + implicit none (type, external) + integer :: j + integer, allocatable :: A(:) + + A = [(3*j, j=1, 10)] + call bar (A) + deallocate (A) +contains + subroutine bar (array) + integer :: i + integer :: array(:) + + !$omp target map(from:array) + !$acc parallel copyout(array) + array = [(-2*i, i = 1, size(array))] + !$omp do private(array) + !$acc loop gang private(array) + do i = 1, 10 + array(i) = 9*i + end do + if (any (array /= [(-2*i, i = 1, 10)])) error stop 2 + !$omp end target + !$acc end parallel + end subroutine bar +end
Hi Jakub, On 6/8/20 5:30 PM, Jakub Jelinek wrote: > I really don't see what is special exactly on TREE_READONLY DECL_ARTIFICIAL I have now split-off the missed-optimization task to a new PR, PR95583, to be handled in a proper way instead of trying to cook-up a hackish special-case version. This patch now simply sets the force_output flag. (a) As output_offload_tables() (i.e. LTO streamout) comes very early, one could just set the force_output flag in this file without further checks or omp-offload.c changes (b) Alternatively, one check that it really works by using gcc_assert (symtab_node::get (it)); in either or both files. (c) or assuming that some optimization worked, one could use: if (!symtab_node::get (it)) continue; The patch does (c) as trimming it to (b) or (a) is trival. All should give currently the same result; the assert checks for this, the "if (...)" is future-optimizations proof, but I fear that before adding passes before output_offload_tables() it makes no difference. (→new PR). (The omp_finish_file comes late enough, but as the LTO has been written before, it does not help.) OK? What about backporting to GCC 10? Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter