diff mbox series

[RFC,nvptx,libgomp] Add 128-bit atomic support

Message ID 9cde09ef-8a7f-fbc7-3677-fb6763cc82b4@suse.de
State New
Headers show
Series [RFC,nvptx,libgomp] Add 128-bit atomic support | expand

Commit Message

Tom de Vries Sept. 2, 2020, 10:22 a.m. UTC
[ was: Re: [patch][nvptx] libgomp: Split testcase in order to XFAIL
__sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for
subword compare-and-swap) ]

On 9/2/20 9:56 AM, Tom de Vries wrote:
> On 9/1/20 2:58 PM, Tom de Vries wrote:
>> On 9/1/20 1:41 PM, Tobias Burnus wrote:
>>> Hi Tom, hello all,
>>>
>>> it turned out that the testcase fails on PowerPC (but not x86_64)
>>> as the nvptx lto complains: unresolved symbol
>>> __sync_val_compare_and_swap_16
>>>
>>> The testcase uses int128 – and that's the culprit, but I have no idea
>>> why it only fails with PowerPC and not with x86-64.
>>>
>>
> 
> Reproduced on x86_64 using trigger patch:
> ...
> $ git diff
> diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
> index ed17bb00205..eccedac192f 100644
> --- a/gcc/config/i386/sync.md
> +++ b/gcc/config/i386/sync.md
> @@ -153,9 +153,15 @@
>      (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
> TARGET_SSE))")
>     ])
> 
> + (define_mode_iterator ATOMIC2
> +    [QI HI SI
> +     (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
> TARGET_SSE))")
> +    TI
> +    ])
> +
>  (define_expand "atomic_load<mode>"
> -  [(set (match_operand:ATOMIC 0 "nonimmediate_operand")
> -       (unspec:ATOMIC [(match_operand:ATOMIC 1 "memory_operand")
> +  [(set (match_operand:ATOMIC2 0 "nonimmediate_operand")
> +       (unspec:ATOMIC2 [(match_operand:ATOMIC2 1 "memory_operand")
>                         (match_operand:SI 2 "const_int_operand")]
>                        UNSPEC_LDA))]
>    ""
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> index d0e82b04790..62b0e032c33 100644
> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> @@ -1,4 +1,5 @@
>  /* { dg-do run } */
> +/* { dg-additional-options "-mcx16" } */
> 
>  #include <stdlib.h>
> 
> ...
> 

And test-case passes on x86_64 with this patch (obviously, in
combination with trigger patch above).

Jakub, WDYT?

Tobias, can you try on powerpc?

Thanks,
- Tom

Comments

Jakub Jelinek Sept. 2, 2020, 10:44 a.m. UTC | #1
On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
> And test-case passes on x86_64 with this patch (obviously, in
> combination with trigger patch above).
> 
> Jakub, WDYT?

I guess the normal answer would be use libatomic, but it isn't ported for
nvptx.
I guess at least temporarily this is ok, though I'm wondering why
you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and
__atomic_*.

What happens if you try
unsigned __int128 v;
#pragma omp declare target (v)
int
main ()
{
  #pragma omp target
  {
    __atomic_add_fetch (&v, 1, __ATOMIC_RELAXED);
    __atomic_fetch_add (&v, 1, __ATOMIC_RELAXED);
    unsigned __int128v exp = 2;
    __atomic_compare_exchange_n (&v, &expected, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE);
  }
}
etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)?
I guess better not to throw everything into one test, because not every
target supports them all (e.g. I think x86_64 doesn't really do 128-bit
atomic loads because the cmpxchg16b insn are not appropriate for .rodata
locations).

	Jakub
Tobias Burnus Sept. 2, 2020, 11:30 a.m. UTC | #2
On 9/2/20 12:22 PM, Tom de Vries wrote:

> Tobias, can you try on powerpc?

Testcase now compiles and runs w/o error message.

On 9/2/20 12:44 PM, Jakub Jelinek wrote:

> I guess the normal answer would be use libatomic, but it isn't ported for
> nvptx.
> I guess at least temporarily this is ok,though I'm wondering why
> you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and
> __atomic_*.
>
> What happens if you try
> unsigned __int128 v;

...

I had to change "unsigned __int128" and "unsigned __int128v" to
"__uint128_t" and "expected" to "exp". Result without offloading
configured on x86-64-gnu-linux:

aotmic.c:(.text+0x84): undefined reference to `__atomic_fetch_add_16'
/usr/bin/ld: aotmic.c:(.text+0xa3): undefined reference to `__atomic_fetch_add_16'
/usr/bin/ld: aotmic.c:(.text+0xda): undefined reference to `__atomic_compare_exchange_16'

And on PowerPC with nvptx (without the RFC patch):

atomic.c: In function 'main._omp_fn.0':
atomic.c:6:11: internal compiler error: in write_fn_proto, at config/nvptx/nvptx.c:913
     6 |   #pragma omp target
       |           ^

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Tom de Vries Sept. 2, 2020, 11:48 a.m. UTC | #3
On 9/2/20 12:44 PM, Jakub Jelinek wrote:
> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>> And test-case passes on x86_64 with this patch (obviously, in
>> combination with trigger patch above).
>>
>> Jakub, WDYT?
> 
> I guess the normal answer would be use libatomic, but it isn't ported for
> nvptx.

Ah, I was not aware of that one, filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.

> I guess at least temporarily this is ok, though I'm wondering why
> you need __sync_*_16 rather than __atomic_*_16, 

That's what omp-expand.c uses in expand_omp_atomic_pipeline:
BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .

Thanks,
- Tom

> or perhaps both __sync_* and
> __atomic_*.
> 
> What happens if you try
> unsigned __int128 v;
> #pragma omp declare target (v)
> int
> main ()
> {
>   #pragma omp target
>   {
>     __atomic_add_fetch (&v, 1, __ATOMIC_RELAXED);
>     __atomic_fetch_add (&v, 1, __ATOMIC_RELAXED);
>     unsigned __int128v exp = 2;
>     __atomic_compare_exchange_n (&v, &expected, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE);
>   }
> }
> etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)?
> I guess better not to throw everything into one test, because not every
> target supports them all (e.g. I think x86_64 doesn't really do 128-bit
> atomic loads because the cmpxchg16b insn are not appropriate for .rodata
> locations).
> 
> 	Jakub
>
Tom de Vries Sept. 11, 2020, 2:24 p.m. UTC | #4
On 9/2/20 1:48 PM, Tom de Vries wrote:
> On 9/2/20 12:44 PM, Jakub Jelinek wrote:
>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>>> And test-case passes on x86_64 with this patch (obviously, in
>>> combination with trigger patch above).
>>>
>>> Jakub, WDYT?
>>
>> I guess the normal answer would be use libatomic, but it isn't ported for
>> nvptx.
> 
> Ah, I was not aware of that one, filed
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.
> 
>> I guess at least temporarily this is ok, though I'm wondering why
>> you need __sync_*_16 rather than __atomic_*_16, 
> 
> That's what omp-expand.c uses in expand_omp_atomic_pipeline:
> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .
> 

I've got an updated version of this patch. It:
- no longer supplies the __atomic_load_16, since that's now handled by
  libatomic
- the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
  which also falls back on libatomic.

I'm currently retesting.

Any comments?

Otherwise, I'll commit on Monday.

Thanks,
- Tom
Tom de Vries Sept. 11, 2020, 2:25 p.m. UTC | #5
[ Fixing ENOPATCH. ]

On 9/11/20 4:24 PM, Tom de Vries wrote:
> On 9/2/20 1:48 PM, Tom de Vries wrote:
>> On 9/2/20 12:44 PM, Jakub Jelinek wrote:
>>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>>>> And test-case passes on x86_64 with this patch (obviously, in
>>>> combination with trigger patch above).
>>>>
>>>> Jakub, WDYT?
>>>
>>> I guess the normal answer would be use libatomic, but it isn't ported for
>>> nvptx.
>>
>> Ah, I was not aware of that one, filed
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.
>>
>>> I guess at least temporarily this is ok, though I'm wondering why
>>> you need __sync_*_16 rather than __atomic_*_16, 
>>
>> That's what omp-expand.c uses in expand_omp_atomic_pipeline:
>> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .
>>
> 
> I've got an updated version of this patch. It:
> - no longer supplies the __atomic_load_16, since that's now handled by
>   libatomic
> - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
>   which also falls back on libatomic.
> 
> I'm currently retesting.
> 
> Any comments?
> 
> Otherwise, I'll commit on Monday.
> 
> Thanks,
> - Tom
>
Jakub Jelinek Sept. 11, 2020, 2:37 p.m. UTC | #6
On Fri, Sep 11, 2020 at 04:24:42PM +0200, Tom de Vries wrote:
> I've got an updated version of this patch. It:
> - no longer supplies the __atomic_load_16, since that's now handled by
>   libatomic
> - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
>   which also falls back on libatomic.
> 
> I'm currently retesting.
> 
> Any comments?
> 
> Otherwise, I'll commit on Monday.

If some functions are now in libatomic, do we expect users to know that and
pass -foffload=-latomic to link, or will mkoffload or whatever do that
automatically?  If the latter, will e.g. libgomp testsuite ensure that
during testing the library can be found even non-installed, if the former,
will libgomp testsuite add it for the respective testcases that need it,
perhaps under special options?

	Jakub
Andrew Stubbs Sept. 11, 2020, 2:48 p.m. UTC | #7
On 11/09/2020 15:25, Tom de Vries wrote:
> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> 
> @@ -1,4 +1,5 @@
> 
> /*·{·dg-do·run·}·*/
> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/

This will probably break amdgcn, where libatomic does not exist.

Andrew
Tom de Vries Sept. 11, 2020, 3:03 p.m. UTC | #8
On 2020-09-11 16:48, Andrew Stubbs wrote:
> On 11/09/2020 15:25, Tom de Vries wrote:
>> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>> 
>> @@ -1,4 +1,5 @@
>> 
>> /*·{·dg-do·run·}·*/
>> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/
> 
> This will probably break amdgcn, where libatomic does not exist.
> 

It looks like the customary way to handle that is to use 
offload_target_nvptx.

Thanks,
- Tom
Tobias Burnus Sept. 11, 2020, 3:29 p.m. UTC | #9
On 9/11/20 5:03 PM, tdevries wrote:

> On 2020-09-11 16:48, Andrew Stubbs wrote:
>> On 11/09/2020 15:25, Tom de Vries wrote:
>>> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>>> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>>>
>>> @@ -1,4 +1,5 @@
>>>
>>> /*·{·dg-do·run·}·*/
>>> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/
>>
>> This will probably break amdgcn, where libatomic does not exist.
>>
> It looks like the customary way to handle that is to use
> offload_target_nvptx.

Or   { target { powerpc*-*-* } }  ?

For some (known) reasons, the __sync_val_compare_and_swap_16 is
produced for powerpc but not for x86-64.

I could imagine that GCN is affected in the same way as nvptx,
except that AMD's ROC is currently not supported for PowerPC,
if I understand it correctly. If FAIL start to occur in some
CPU/GPU combinations, it can be still revisited.

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
diff mbox series

Patch

[nvptx, libgomp] Add 128-bit atomic support

---
 libgomp/config/nvptx/atomic.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/libgomp/config/nvptx/atomic.c b/libgomp/config/nvptx/atomic.c
new file mode 100644
index 00000000000..49a6d350827
--- /dev/null
+++ b/libgomp/config/nvptx/atomic.c
@@ -0,0 +1,34 @@ 
+#include <stdbool.h>
+
+#include "../../atomic.c"
+
+unsigned __int128
+__sync_val_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+				unsigned __int128 newval)
+{
+  volatile unsigned __int128 *ptr = vptr;
+  GOMP_atomic_start ();
+  unsigned __int128 val = *ptr;
+  if (val == oldval)
+    *ptr = newval;
+  GOMP_atomic_end ();
+  return val;
+}
+
+bool
+__sync_bool_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+				 unsigned __int128 newval)
+{
+  return __sync_val_compare_and_swap_16 (vptr, oldval, newval) == oldval;
+}
+
+unsigned __int128
+__atomic_load_16 (const volatile void *vptr,
+		  int memorder __attribute__((unused)))
+{
+  const volatile unsigned __int128 *ptr = vptr;
+  GOMP_atomic_start ();
+  unsigned __int128 val = *ptr;
+  GOMP_atomic_end ();
+  return val;
+}