Message ID | 20200908065104.GA21827@delia |
---|---|
State | New |
Headers | show |
Series | [libatomic] Add nvptx support | expand |
On 9/8/20 8:51 AM, Tom de Vries wrote:
> Add nvptx support to libatomic.
I tried it on powerpc64le-none-linux-gnu and that solves the
__sync_val_compare_and_swap_16 issue, I reported at
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553070.html
However, when trying Jakub's example (see below; syntax fixed
version), https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553142.html
it (still) fails with:
atomic.c: In function 'main._omp_fn.0':
atomic.c:6:11: internal compiler error: in write_fn_proto, at config/nvptx/nvptx.c:913
6 | #pragma omp target
Tobias
PS: The 'atomic.c' testcase:
__uint128_t v;
#pragma omp declare target (v)
int
main ()
{
#pragma omp target
{
__atomic_add_fetch (&v, 1, __ATOMIC_RELAXED);
__atomic_fetch_add (&v, 1, __ATOMIC_RELAXED);
__uint128_t exp = 2;
__atomic_compare_exchange_n (&v, &exp, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE);
}
}
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Hi Tom, On 9/8/20 5:05 PM, Tobias Burnus wrote: > On 9/8/20 8:51 AM, Tom de Vries wrote: >> PR target/96964 >> * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New >> expansion. >> * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. I have your patch applied on a current mainline powerpc64le-none-linux-gnu + nvptx offloading build. And I observe the following fails – which seems to be new and related to your patch (but I have not confirmed it by reverting your libatomic patch). Required option for the fail: "-O2 -ftracer", hence, only the "-O3 ..." testsuite builds fail. (-ftracer = "Perform tail duplication to enlarge superblock size.") during RTL pass: mach asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at config/nvptx/nvptx.c:3293 0x10bf9f13 nvptx_find_par gcc/config/nvptx/nvptx.c:3293 0x10bf9b97 nvptx_find_par gcc/config/nvptx/nvptx.c:3320 0x10bf9b97 nvptx_find_par gcc/config/nvptx/nvptx.c:3320 ... The ICE occurs for the second assert of: case CODE_FOR_nvptx_join: /* A loop tail. Finish the current loop and return to parent. */ { unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0)); gcc_assert (par->mask == mask); gcc_assert (par->join_block == NULL); gdb shows: (gdb) p debug_bb(par->join_block ) (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK) (insn 31 213 204 24 (unspec_volatile:SI [ (const_int 4 [0x4]) ] UNSPECV_JOIN) "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 {nvptx_join} (nil)) (jump_insn 204 31 205 24 (set (pc) (label_ref 198)) 121 {jump} (nil) -> 198) That affects the testcases: libgomp.oacc-fortran/asyncwait-1.f90 libgomp.oacc-fortran/asyncwait-2.f90 libgomp.oacc-fortran/asyncwait-3.f90 libgomp.oacc-fortran/atomic_capture-1.f90 libgomp.oacc-fortran/atomic_update-1.f90 libgomp.oacc-fortran/classtypes-1.f95 libgomp.oacc-fortran/collapse-1.f90 libgomp.oacc-fortran/collapse-2.f90 libgomp.oacc-fortran/collapse-3.f90 libgomp.oacc-fortran/collapse-4.f90 libgomp.oacc-fortran/collapse-5.f90 libgomp.oacc-fortran/collapse-6.f90 libgomp.oacc-fortran/collapse-7.f90 libgomp.oacc-fortran/collapse-8.f90 libgomp.oacc-fortran/combined-directives-1.f90 libgomp.oacc-fortran/combined-reduction.f90 libgomp.oacc-fortran/common-block-1.f90 libgomp.oacc-fortran/common-block-2.f90 libgomp.oacc-fortran/common-block-3.f90 libgomp.oacc-fortran/deep-copy-1.f90 libgomp.oacc-fortran/deep-copy-3.f90 libgomp.oacc-fortran/deep-copy-4.f90 libgomp.oacc-fortran/deep-copy-5.f90 libgomp.oacc-fortran/deep-copy-6-no_finalize.F90 libgomp.oacc-fortran/deep-copy-6.f90 libgomp.oacc-fortran/deep-copy-7.f90 libgomp.oacc-fortran/deep-copy-8.f90 libgomp.oacc-fortran/derived-type-1.f90 libgomp.oacc-fortran/host_data-2.f90 libgomp.oacc-fortran/host_data-3.f libgomp.oacc-fortran/host_data-4.f90 libgomp.oacc-fortran/implicit-firstprivate-ref.f90 libgomp.oacc-fortran/lib-14.f90 libgomp.oacc-fortran/map-1.f90 libgomp.oacc-fortran/nested-function-1.f90 libgomp.oacc-fortran/nested-function-2.f90 libgomp.oacc-fortran/nested-function-3.f90 libgomp.oacc-fortran/no_create-3.F90 libgomp.oacc-fortran/optional-data-copyin.f90 libgomp.oacc-fortran/optional-data-copyout.f90 libgomp.oacc-fortran/optional-data-enter-exit.f90 libgomp.oacc-fortran/optional-declare.f90 libgomp.oacc-fortran/optional-firstprivate.f90 libgomp.oacc-fortran/optional-reduction.f90 libgomp.oacc-fortran/optional-update-device.f90 libgomp.oacc-fortran/optional-update-host.f90 libgomp.oacc-fortran/parallel-dims.f90 libgomp.oacc-fortran/parallel-loop-1.f90 libgomp.oacc-fortran/pr81352.f90 libgomp.oacc-fortran/pr84028.f90 libgomp.oacc-fortran/reduction-1.f90 libgomp.oacc-fortran/reduction-2.f90 libgomp.oacc-fortran/reduction-3.f90 libgomp.oacc-fortran/reduction-4.f90 libgomp.oacc-fortran/reduction-5.f90 libgomp.oacc-fortran/reduction-6.f90 libgomp.oacc-fortran/reduction-7.f90 libgomp.oacc-fortran/reduction-8.f90 libgomp.oacc-fortran/routine-1.f90 libgomp.oacc-fortran/routine-2.f90 libgomp.oacc-fortran/routine-3.f90 libgomp.oacc-fortran/routine-4.f90 libgomp.oacc-fortran/routine-7.f90 libgomp.oacc-fortran/routine-9.f90 libgomp.oacc-fortran/subarrays-1.f90 libgomp.oacc-fortran/subarrays-2.f90 libgomp.oacc-fortran/update-2.f90 Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
On 9/9/20 2:36 PM, Tobias Burnus wrote: > Hi Tom, > > On 9/8/20 5:05 PM, Tobias Burnus wrote: > >> On 9/8/20 8:51 AM, Tom de Vries wrote: >>> PR target/96964 >>> * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New >>> expansion. >>> * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. > > I have your patch applied on a current mainline powerpc64le-none-linux-gnu > + nvptx offloading build. Thanks for trying this out. > And I observe the following fails – which seems > to be new and related to your patch (but I have not confirmed it by > reverting your libatomic patch). > Could you confirm that? Meanwhile, I'll try to reproduce on x86_64. > Required option for the fail: "-O2 -ftracer", > hence, only the "-O3 ..." testsuite builds fail. > (-ftracer = "Perform tail duplication to enlarge superblock size.") > > > during RTL pass: mach > asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at > config/nvptx/nvptx.c:3293 > 0x10bf9f13 nvptx_find_par > gcc/config/nvptx/nvptx.c:3293 > 0x10bf9b97 nvptx_find_par > gcc/config/nvptx/nvptx.c:3320 > 0x10bf9b97 nvptx_find_par > gcc/config/nvptx/nvptx.c:3320 > ... > > > The ICE occurs for the second assert of: > case CODE_FOR_nvptx_join: > /* A loop tail. Finish the current loop and return to > parent. */ > { > unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0)); > > gcc_assert (par->mask == mask); > gcc_assert (par->join_block == NULL); > > gdb shows: > (gdb) p debug_bb(par->join_block ) > (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK) > (insn 31 213 204 24 (unspec_volatile:SI [ > (const_int 4 [0x4]) > ] UNSPECV_JOIN) > "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 > {nvptx_join} > (nil)) > (jump_insn 204 31 205 24 (set (pc) > (label_ref 198)) 121 {jump} > (nil) > -> 198) > Yep, code duplication works against the matching of fork/join, it's not the first time we see this. Usually the fix is to make an optimization pass conservative with respect to these fork/join regions, but AFAICT, ftracer already has such code in ignore_bb_p that tests gimple_call_internal_unique_p. So, perhaps the ftracer pass is the trigger, but not the pass that does the problematic transformation? Just a guess at this point. Thanks, - Tom > > That affects the testcases: > libgomp.oacc-fortran/asyncwait-1.f90 > libgomp.oacc-fortran/asyncwait-2.f90 > libgomp.oacc-fortran/asyncwait-3.f90 > libgomp.oacc-fortran/atomic_capture-1.f90 > libgomp.oacc-fortran/atomic_update-1.f90 > libgomp.oacc-fortran/classtypes-1.f95 > libgomp.oacc-fortran/collapse-1.f90 > libgomp.oacc-fortran/collapse-2.f90 > libgomp.oacc-fortran/collapse-3.f90 > libgomp.oacc-fortran/collapse-4.f90 > libgomp.oacc-fortran/collapse-5.f90 > libgomp.oacc-fortran/collapse-6.f90 > libgomp.oacc-fortran/collapse-7.f90 > libgomp.oacc-fortran/collapse-8.f90 > libgomp.oacc-fortran/combined-directives-1.f90 > libgomp.oacc-fortran/combined-reduction.f90 > libgomp.oacc-fortran/common-block-1.f90 > libgomp.oacc-fortran/common-block-2.f90 > libgomp.oacc-fortran/common-block-3.f90 > libgomp.oacc-fortran/deep-copy-1.f90 > libgomp.oacc-fortran/deep-copy-3.f90 > libgomp.oacc-fortran/deep-copy-4.f90 > libgomp.oacc-fortran/deep-copy-5.f90 > libgomp.oacc-fortran/deep-copy-6-no_finalize.F90 > libgomp.oacc-fortran/deep-copy-6.f90 > libgomp.oacc-fortran/deep-copy-7.f90 > libgomp.oacc-fortran/deep-copy-8.f90 > libgomp.oacc-fortran/derived-type-1.f90 > libgomp.oacc-fortran/host_data-2.f90 > libgomp.oacc-fortran/host_data-3.f > libgomp.oacc-fortran/host_data-4.f90 > libgomp.oacc-fortran/implicit-firstprivate-ref.f90 > libgomp.oacc-fortran/lib-14.f90 > libgomp.oacc-fortran/map-1.f90 > libgomp.oacc-fortran/nested-function-1.f90 > libgomp.oacc-fortran/nested-function-2.f90 > libgomp.oacc-fortran/nested-function-3.f90 > libgomp.oacc-fortran/no_create-3.F90 > libgomp.oacc-fortran/optional-data-copyin.f90 > libgomp.oacc-fortran/optional-data-copyout.f90 > libgomp.oacc-fortran/optional-data-enter-exit.f90 > libgomp.oacc-fortran/optional-declare.f90 > libgomp.oacc-fortran/optional-firstprivate.f90 > libgomp.oacc-fortran/optional-reduction.f90 > libgomp.oacc-fortran/optional-update-device.f90 > libgomp.oacc-fortran/optional-update-host.f90 > libgomp.oacc-fortran/parallel-dims.f90 > libgomp.oacc-fortran/parallel-loop-1.f90 > libgomp.oacc-fortran/pr81352.f90 > libgomp.oacc-fortran/pr84028.f90 > libgomp.oacc-fortran/reduction-1.f90 > libgomp.oacc-fortran/reduction-2.f90 > libgomp.oacc-fortran/reduction-3.f90 > libgomp.oacc-fortran/reduction-4.f90 > libgomp.oacc-fortran/reduction-5.f90 > libgomp.oacc-fortran/reduction-6.f90 > libgomp.oacc-fortran/reduction-7.f90 > libgomp.oacc-fortran/reduction-8.f90 > libgomp.oacc-fortran/routine-1.f90 > libgomp.oacc-fortran/routine-2.f90 > libgomp.oacc-fortran/routine-3.f90 > libgomp.oacc-fortran/routine-4.f90 > libgomp.oacc-fortran/routine-7.f90 > libgomp.oacc-fortran/routine-9.f90 > libgomp.oacc-fortran/subarrays-1.f90 > libgomp.oacc-fortran/subarrays-2.f90 > libgomp.oacc-fortran/update-2.f90 > > Tobias > > ----------------- > Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / > Germany > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, > Alexander Walter
On 9/9/20 3:15 PM, Tom de Vries wrote: > On 9/9/20 2:36 PM, Tobias Burnus wrote: >> Hi Tom, >> >> On 9/8/20 5:05 PM, Tobias Burnus wrote: >> >>> On 9/8/20 8:51 AM, Tom de Vries wrote: >>>> PR target/96964 >>>> * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New >>>> expansion. >>>> * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. >> >> I have your patch applied on a current mainline powerpc64le-none-linux-gnu >> + nvptx offloading build. > > Thanks for trying this out. > >> And I observe the following fails – which seems >> to be new and related to your patch (but I have not confirmed it by >> reverting your libatomic patch). >> > > Could you confirm that? > > Meanwhile, I'll try to reproduce on x86_64. > >> Required option for the fail: "-O2 -ftracer", >> hence, only the "-O3 ..." testsuite builds fail. >> (-ftracer = "Perform tail duplication to enlarge superblock size.") >> >> >> during RTL pass: mach >> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at >> config/nvptx/nvptx.c:3293 >> 0x10bf9f13 nvptx_find_par >> gcc/config/nvptx/nvptx.c:3293 >> 0x10bf9b97 nvptx_find_par >> gcc/config/nvptx/nvptx.c:3320 >> 0x10bf9b97 nvptx_find_par >> gcc/config/nvptx/nvptx.c:3320 >> ... >> >> >> The ICE occurs for the second assert of: >> case CODE_FOR_nvptx_join: >> /* A loop tail. Finish the current loop and return to >> parent. */ >> { >> unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0)); >> >> gcc_assert (par->mask == mask); >> gcc_assert (par->join_block == NULL); >> >> gdb shows: >> (gdb) p debug_bb(par->join_block ) >> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK) >> (insn 31 213 204 24 (unspec_volatile:SI [ >> (const_int 4 [0x4]) >> ] UNSPECV_JOIN) >> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 >> {nvptx_join} >> (nil)) >> (jump_insn 204 31 205 24 (set (pc) >> (label_ref 198)) 121 {jump} >> (nil) >> -> 198) >> > > Yep, code duplication works against the matching of fork/join, it's not > the first time we see this. > > Usually the fix is to make an optimization pass conservative with > respect to these fork/join regions, but AFAICT, ftracer already has such > code in ignore_bb_p that tests gimple_call_internal_unique_p. > > So, perhaps the ftracer pass is the trigger, but not the pass that does > the problematic transformation? Just a guess at this point. > I can reproduce it, and it's indeed the ftracer pass that does the duplication. So, the question is why doesn't ignore_bb_p work. Thanks, - Tom > >> >> That affects the testcases: >> libgomp.oacc-fortran/asyncwait-1.f90 >> libgomp.oacc-fortran/asyncwait-2.f90 >> libgomp.oacc-fortran/asyncwait-3.f90 >> libgomp.oacc-fortran/atomic_capture-1.f90 >> libgomp.oacc-fortran/atomic_update-1.f90 >> libgomp.oacc-fortran/classtypes-1.f95 >> libgomp.oacc-fortran/collapse-1.f90 >> libgomp.oacc-fortran/collapse-2.f90 >> libgomp.oacc-fortran/collapse-3.f90 >> libgomp.oacc-fortran/collapse-4.f90 >> libgomp.oacc-fortran/collapse-5.f90 >> libgomp.oacc-fortran/collapse-6.f90 >> libgomp.oacc-fortran/collapse-7.f90 >> libgomp.oacc-fortran/collapse-8.f90 >> libgomp.oacc-fortran/combined-directives-1.f90 >> libgomp.oacc-fortran/combined-reduction.f90 >> libgomp.oacc-fortran/common-block-1.f90 >> libgomp.oacc-fortran/common-block-2.f90 >> libgomp.oacc-fortran/common-block-3.f90 >> libgomp.oacc-fortran/deep-copy-1.f90 >> libgomp.oacc-fortran/deep-copy-3.f90 >> libgomp.oacc-fortran/deep-copy-4.f90 >> libgomp.oacc-fortran/deep-copy-5.f90 >> libgomp.oacc-fortran/deep-copy-6-no_finalize.F90 >> libgomp.oacc-fortran/deep-copy-6.f90 >> libgomp.oacc-fortran/deep-copy-7.f90 >> libgomp.oacc-fortran/deep-copy-8.f90 >> libgomp.oacc-fortran/derived-type-1.f90 >> libgomp.oacc-fortran/host_data-2.f90 >> libgomp.oacc-fortran/host_data-3.f >> libgomp.oacc-fortran/host_data-4.f90 >> libgomp.oacc-fortran/implicit-firstprivate-ref.f90 >> libgomp.oacc-fortran/lib-14.f90 >> libgomp.oacc-fortran/map-1.f90 >> libgomp.oacc-fortran/nested-function-1.f90 >> libgomp.oacc-fortran/nested-function-2.f90 >> libgomp.oacc-fortran/nested-function-3.f90 >> libgomp.oacc-fortran/no_create-3.F90 >> libgomp.oacc-fortran/optional-data-copyin.f90 >> libgomp.oacc-fortran/optional-data-copyout.f90 >> libgomp.oacc-fortran/optional-data-enter-exit.f90 >> libgomp.oacc-fortran/optional-declare.f90 >> libgomp.oacc-fortran/optional-firstprivate.f90 >> libgomp.oacc-fortran/optional-reduction.f90 >> libgomp.oacc-fortran/optional-update-device.f90 >> libgomp.oacc-fortran/optional-update-host.f90 >> libgomp.oacc-fortran/parallel-dims.f90 >> libgomp.oacc-fortran/parallel-loop-1.f90 >> libgomp.oacc-fortran/pr81352.f90 >> libgomp.oacc-fortran/pr84028.f90 >> libgomp.oacc-fortran/reduction-1.f90 >> libgomp.oacc-fortran/reduction-2.f90 >> libgomp.oacc-fortran/reduction-3.f90 >> libgomp.oacc-fortran/reduction-4.f90 >> libgomp.oacc-fortran/reduction-5.f90 >> libgomp.oacc-fortran/reduction-6.f90 >> libgomp.oacc-fortran/reduction-7.f90 >> libgomp.oacc-fortran/reduction-8.f90 >> libgomp.oacc-fortran/routine-1.f90 >> libgomp.oacc-fortran/routine-2.f90 >> libgomp.oacc-fortran/routine-3.f90 >> libgomp.oacc-fortran/routine-4.f90 >> libgomp.oacc-fortran/routine-7.f90 >> libgomp.oacc-fortran/routine-9.f90 >> libgomp.oacc-fortran/subarrays-1.f90 >> libgomp.oacc-fortran/subarrays-2.f90 >> libgomp.oacc-fortran/update-2.f90 >> >> Tobias >> >> ----------------- >> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / >> Germany >> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, >> Alexander Walter
On 9/9/20 4:14 PM, Tom de Vries wrote: > On 9/9/20 3:15 PM, Tom de Vries wrote: >> On 9/9/20 2:36 PM, Tobias Burnus wrote: >>> Hi Tom, >>> >>> On 9/8/20 5:05 PM, Tobias Burnus wrote: >>> >>>> On 9/8/20 8:51 AM, Tom de Vries wrote: >>>>> PR target/96964 >>>>> * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New >>>>> expansion. >>>>> * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. >>> >>> I have your patch applied on a current mainline powerpc64le-none-linux-gnu >>> + nvptx offloading build. >> >> Thanks for trying this out. >> >>> And I observe the following fails – which seems >>> to be new and related to your patch (but I have not confirmed it by >>> reverting your libatomic patch). >>> >> >> Could you confirm that? >> >> Meanwhile, I'll try to reproduce on x86_64. >> >>> Required option for the fail: "-O2 -ftracer", >>> hence, only the "-O3 ..." testsuite builds fail. >>> (-ftracer = "Perform tail duplication to enlarge superblock size.") >>> >>> >>> during RTL pass: mach >>> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at >>> config/nvptx/nvptx.c:3293 >>> 0x10bf9f13 nvptx_find_par >>> gcc/config/nvptx/nvptx.c:3293 >>> 0x10bf9b97 nvptx_find_par >>> gcc/config/nvptx/nvptx.c:3320 >>> 0x10bf9b97 nvptx_find_par >>> gcc/config/nvptx/nvptx.c:3320 >>> ... >>> >>> >>> The ICE occurs for the second assert of: >>> case CODE_FOR_nvptx_join: >>> /* A loop tail. Finish the current loop and return to >>> parent. */ >>> { >>> unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0)); >>> >>> gcc_assert (par->mask == mask); >>> gcc_assert (par->join_block == NULL); >>> >>> gdb shows: >>> (gdb) p debug_bb(par->join_block ) >>> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK) >>> (insn 31 213 204 24 (unspec_volatile:SI [ >>> (const_int 4 [0x4]) >>> ] UNSPECV_JOIN) >>> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 >>> {nvptx_join} >>> (nil)) >>> (jump_insn 204 31 205 24 (set (pc) >>> (label_ref 198)) 121 {jump} >>> (nil) >>> -> 198) >>> >> >> Yep, code duplication works against the matching of fork/join, it's not >> the first time we see this. >> >> Usually the fix is to make an optimization pass conservative with >> respect to these fork/join regions, but AFAICT, ftracer already has such >> code in ignore_bb_p that tests gimple_call_internal_unique_p. >> >> So, perhaps the ftracer pass is the trigger, but not the pass that does >> the problematic transformation? Just a guess at this point. >> > > I can reproduce it, and it's indeed the ftracer pass that does the > duplication. So, the question is why doesn't ignore_bb_p work. Filed PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97000 for this. Thanks, - Tom
Hi Tom, On 9/9/20 3:15 PM, Tom de Vries wrote: >> And I observe the following fails – which seems >> to be new and related to your patch (but I have not confirmed it by >> reverting your libatomic patch). >> > Could you confirm that? It is an indenpend issue; it is newish and I can reproduce it on x86-64-gnu-linux (not with -O2 but with -O3 -ftracer). Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Hi all, hello Tom,
On 9/8/20 8:51 AM, Tom de Vries wrote:
> Add nvptx support to libatomic.
I have tried tried this patch together with the latest patch for/of PR97000
for tree-cfgcleanup.c on tree-cfgcleanup.c:
Now, I only see the expected failures and the compare_and_swap issue in
the testsuite is gone (libgomp.c-c++-common/reduction-17.c)
:-)
(Jakub's atomic test fails with a link error:
"__atomic_compare_exchange_16" (the build I have has
--disable-libatomic, hence, I cannot check this with
-foffload=-latomic right now; something to check later.)
Thanks,
Tobias
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
On 9/8/20 8:51 AM, Tom de Vries wrote: > Hi, > > Add nvptx support to libatomic. > > Given that atomic_test_and_set is not implemented for nvptx (PR96964), the > compiler translates __atomic_test_and_set falling back onto the "Failing all > else, assume a single threaded environment and simply perform the operation" > case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic > operation. > > Still, that counts as supported for the configure test of libatomic, so we > end up with HAVE_ATOMIC_TAS_1/2/4/8/16 == 1, and the corresponding > __atomic_test_and_set_1/2/4/8/16 in libatomic all using that non-atomic > implementation. > > Fix this by adding an atomic_test_and_set expansion for nvptx, that uses > libatomics __atomic_test_and_set_1. > > This again makes the configure tests for HAVE_ATOMIC_TAS_1/2/4/8/16 fail, so > instead we use this case in tas_n.c: > ... > /* If this type is smaller than word-sized, fall back to a word-sized > compare-and-swap loop. */ > bool > SIZE(libat_test_and_set) (UTYPE *mptr, int smodel) > ... > which for __atomic_test_and_set_8 uses INVERT_MASK_8. > > Add INVERT_MASK_8 in libatomic_i.h, as well as MASK_8. > > Tested libatomic testsuite on nvptx. > > Non-target bits (sync-builtins.def, libatomic_i.h) OK for trunk? > > Any other comments? > > Thanks, > - Tom > > [libatomic] Add nvptx support > > gcc/ChangeLog: > > PR target/96964 > * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New > expansion. > * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. > I realized after reading the header comment of write_fn_proto_from_insn in nvptx.c that I could drop the sync-builtins.def bit. So, committed without the sync-builtins.def change. Thanks, - Tom > libatomic/ChangeLog: > > PR target/96898 > * configure.tgt: Add nvptx. > * libatomic_i.h (MASK_8, INVERT_MASK_8): New macro definition. > * config/nvptx/host-config.h: New file. > * config/nvptx/lock.c: New file. > > --- > gcc/config/nvptx/nvptx.md | 16 +++++++++++ > gcc/sync-builtins.def | 2 ++ > libatomic/config/nvptx/host-config.h | 56 ++++++++++++++++++++++++++++++++++++ > libatomic/config/nvptx/lock.c | 56 ++++++++++++++++++++++++++++++++++++ > libatomic/configure.tgt | 3 ++ > libatomic/libatomic_i.h | 2 ++ > 6 files changed, 135 insertions(+) > > diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md > index 4168190fa42..6178e6a0f77 100644 > --- a/gcc/config/nvptx/nvptx.md > +++ b/gcc/config/nvptx/nvptx.md > @@ -1667,6 +1667,22 @@ > "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;" > [(set_attr "atomic" "true")]) > > +(define_expand "atomic_test_and_set" > + [(match_operand:SI 0 "nvptx_register_operand") ;; bool success output > + (match_operand:QI 1 "memory_operand") ;; memory > + (match_operand:SI 2 "const_int_operand")] ;; model > + "" > +{ > + rtx libfunc; > + rtx addr; > + libfunc = init_one_libfunc ("__atomic_test_and_set_1"); > + addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0)); > + emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode, > + addr, ptr_mode, > + operands[2], SImode); > + DONE; > +}) > + > (define_insn "nvptx_barsync" > [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri") > (match_operand:SI 1 "const_int_operand")] > diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def > index 156a13ce0f8..b802257bd1a 100644 > --- a/gcc/sync-builtins.def > +++ b/gcc/sync-builtins.def > @@ -261,6 +261,8 @@ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, "__sync_synchronize", > > DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET, "__atomic_test_and_set", > BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET_1, "__atomic_test_and_set_1", > + BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) > > DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_CLEAR, "__atomic_clear", BT_FN_VOID_VPTR_INT, > ATTR_NOTHROWCALL_LEAF_LIST) > diff --git a/libatomic/config/nvptx/host-config.h b/libatomic/config/nvptx/host-config.h > new file mode 100644 > index 00000000000..eb9de81f388 > --- /dev/null > +++ b/libatomic/config/nvptx/host-config.h > @@ -0,0 +1,56 @@ > +/* Copyright (C) 2020 Free Software Foundation, Inc. > + > + This file is part of the GNU Atomic Library (libatomic). > + > + Libatomic is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY > + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS > + FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + more details. > + > + Under Section 7 of GPL version 3, you are granted additional > + permissions described in the GCC Runtime Library Exception, version > + 3.1, as published by the Free Software Foundation. > + > + You should have received a copy of the GNU General Public License and > + a copy of the GCC Runtime Library Exception along with this program; > + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > + <http://www.gnu.org/licenses/>. */ > + > +/* Copied from libatomic/config/posix/host-config.h. */ > + > +/* Included after all more target-specific host-config.h. */ > + > + > +#ifndef protect_start_end > +# ifdef HAVE_ATTRIBUTE_VISIBILITY > +# pragma GCC visibility push(hidden) > +# endif > + > +void libat_lock_1 (void *ptr); > +void libat_unlock_1 (void *ptr); > + > +static inline UWORD > +protect_start (void *ptr) > +{ > + libat_lock_1 (ptr); > + return 0; > +} > + > +static inline void > +protect_end (void *ptr, UWORD dummy UNUSED) > +{ > + libat_unlock_1 (ptr); > +} > + > +# define protect_start_end 1 > +# ifdef HAVE_ATTRIBUTE_VISIBILITY > +# pragma GCC visibility pop > +# endif > +#endif /* protect_start_end */ > + > +#include_next <host-config.h> > diff --git a/libatomic/config/nvptx/lock.c b/libatomic/config/nvptx/lock.c > new file mode 100644 > index 00000000000..dea85a3e5bd > --- /dev/null > +++ b/libatomic/config/nvptx/lock.c > @@ -0,0 +1,56 @@ > +/* Copyright (C) 2020 Free Software Foundation, Inc. > + > + This file is part of the GNU Atomic Library (libatomic). > + > + Libatomic is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY > + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS > + FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + more details. > + > + Under Section 7 of GPL version 3, you are granted additional > + permissions described in the GCC Runtime Library Exception, version > + 3.1, as published by the Free Software Foundation. > + > + You should have received a copy of the GNU General Public License and > + a copy of the GCC Runtime Library Exception along with this program; > + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > + <http://www.gnu.org/licenses/>. */ > + > +/* Functions libat_lock_n/libat_unlock_n based on GOMP_atomic_start/end in > + libgomp/atomic.c. */ > + > +#include "libatomic_i.h" > + > +static int atomic_lock; > + > +void > +libat_lock_n (void *ptr __attribute__((unused)), > + size_t n __attribute__((unused))) > +{ > + while (__sync_lock_test_and_set (&atomic_lock, 1)) > + ; /* Spin. */ > +} > + > +void > +libat_unlock_n (void *ptr __attribute__((unused)), > + size_t n __attribute__((unused))) > +{ > + __sync_lock_release (&atomic_lock); > +} > + > +void > +libat_lock_1 (void *ptr) > +{ > + libat_lock_n (ptr, 1); > +} > + > +void > +libat_unlock_1 (void *ptr) > +{ > + libat_unlock_n (ptr, 1); > +} > diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt > index efb3b1efb68..7834e0a6528 100644 > --- a/libatomic/configure.tgt > +++ b/libatomic/configure.tgt > @@ -174,6 +174,9 @@ case "${target}" in > UNSUPPORTED=1 > ;; > > + nvptx*-*-*) > + ;; > + > *) > # Who are you? > UNSUPPORTED=1 > diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h > index 081b154e9d7..37de9921024 100644 > --- a/libatomic/libatomic_i.h > +++ b/libatomic/libatomic_i.h > @@ -109,9 +109,11 @@ typedef unsigned UWORD __attribute__((mode(word))); > #define MASK_1 ((UWORD)0xff) > #define MASK_2 ((UWORD)0xffff) > #define MASK_4 ((UWORD)0xffffffff) > +#define MASK_8 ((UWORD)0xffffffffffffffff) > #define INVERT_MASK_1 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 1) * CHAR_BIT)) > #define INVERT_MASK_2 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 2) * CHAR_BIT)) > #define INVERT_MASK_4 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 4) * CHAR_BIT)) > +#define INVERT_MASK_8 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 8) * CHAR_BIT)) > > /* Most of the files in this library are compiled multiple times with > N defined to be a power of 2 between 1 and 16. The SIZE macro is >
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 4168190fa42..6178e6a0f77 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1667,6 +1667,22 @@ "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;" [(set_attr "atomic" "true")]) +(define_expand "atomic_test_and_set" + [(match_operand:SI 0 "nvptx_register_operand") ;; bool success output + (match_operand:QI 1 "memory_operand") ;; memory + (match_operand:SI 2 "const_int_operand")] ;; model + "" +{ + rtx libfunc; + rtx addr; + libfunc = init_one_libfunc ("__atomic_test_and_set_1"); + addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0)); + emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode, + addr, ptr_mode, + operands[2], SImode); + DONE; +}) + (define_insn "nvptx_barsync" [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri") (match_operand:SI 1 "const_int_operand")] diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def index 156a13ce0f8..b802257bd1a 100644 --- a/gcc/sync-builtins.def +++ b/gcc/sync-builtins.def @@ -261,6 +261,8 @@ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, "__sync_synchronize", DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET, "__atomic_test_and_set", BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET_1, "__atomic_test_and_set_1", + BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_CLEAR, "__atomic_clear", BT_FN_VOID_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) diff --git a/libatomic/config/nvptx/host-config.h b/libatomic/config/nvptx/host-config.h new file mode 100644 index 00000000000..eb9de81f388 --- /dev/null +++ b/libatomic/config/nvptx/host-config.h @@ -0,0 +1,56 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of the GNU Atomic Library (libatomic). + + Libatomic is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* Copied from libatomic/config/posix/host-config.h. */ + +/* Included after all more target-specific host-config.h. */ + + +#ifndef protect_start_end +# ifdef HAVE_ATTRIBUTE_VISIBILITY +# pragma GCC visibility push(hidden) +# endif + +void libat_lock_1 (void *ptr); +void libat_unlock_1 (void *ptr); + +static inline UWORD +protect_start (void *ptr) +{ + libat_lock_1 (ptr); + return 0; +} + +static inline void +protect_end (void *ptr, UWORD dummy UNUSED) +{ + libat_unlock_1 (ptr); +} + +# define protect_start_end 1 +# ifdef HAVE_ATTRIBUTE_VISIBILITY +# pragma GCC visibility pop +# endif +#endif /* protect_start_end */ + +#include_next <host-config.h> diff --git a/libatomic/config/nvptx/lock.c b/libatomic/config/nvptx/lock.c new file mode 100644 index 00000000000..dea85a3e5bd --- /dev/null +++ b/libatomic/config/nvptx/lock.c @@ -0,0 +1,56 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of the GNU Atomic Library (libatomic). + + Libatomic is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* Functions libat_lock_n/libat_unlock_n based on GOMP_atomic_start/end in + libgomp/atomic.c. */ + +#include "libatomic_i.h" + +static int atomic_lock; + +void +libat_lock_n (void *ptr __attribute__((unused)), + size_t n __attribute__((unused))) +{ + while (__sync_lock_test_and_set (&atomic_lock, 1)) + ; /* Spin. */ +} + +void +libat_unlock_n (void *ptr __attribute__((unused)), + size_t n __attribute__((unused))) +{ + __sync_lock_release (&atomic_lock); +} + +void +libat_lock_1 (void *ptr) +{ + libat_lock_n (ptr, 1); +} + +void +libat_unlock_1 (void *ptr) +{ + libat_unlock_n (ptr, 1); +} diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt index efb3b1efb68..7834e0a6528 100644 --- a/libatomic/configure.tgt +++ b/libatomic/configure.tgt @@ -174,6 +174,9 @@ case "${target}" in UNSUPPORTED=1 ;; + nvptx*-*-*) + ;; + *) # Who are you? UNSUPPORTED=1 diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h index 081b154e9d7..37de9921024 100644 --- a/libatomic/libatomic_i.h +++ b/libatomic/libatomic_i.h @@ -109,9 +109,11 @@ typedef unsigned UWORD __attribute__((mode(word))); #define MASK_1 ((UWORD)0xff) #define MASK_2 ((UWORD)0xffff) #define MASK_4 ((UWORD)0xffffffff) +#define MASK_8 ((UWORD)0xffffffffffffffff) #define INVERT_MASK_1 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 1) * CHAR_BIT)) #define INVERT_MASK_2 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 2) * CHAR_BIT)) #define INVERT_MASK_4 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 4) * CHAR_BIT)) +#define INVERT_MASK_8 ((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 8) * CHAR_BIT)) /* Most of the files in this library are compiled multiple times with N defined to be a power of 2 between 1 and 16. The SIZE macro is