diff mbox series

[nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap)

Message ID 0162d4b1-fe2a-a670-7a39-6fcef4dae19a@codesourcery.com
State New
Headers show
Series [nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap) | expand

Commit Message

Tobias Burnus Sept. 1, 2020, 11:41 a.m. UTC
Hi Tom, hello all,

it turned out that the testcase fails on PowerPC (but not x86_64)
as the nvptx lto complains: unresolved symbol __sync_val_compare_and_swap_16

The testcase uses int128 – and that's the culprit, but I have no idea
why it only fails with PowerPC and not with x86-64.

Unless someone sees a good way to implement __sync_val_compare_and_swap_16,
I propose to XFAIL it for now.

Namely: Split-off the int128 testcase into a new file and run it only
if not nvptx-offloading on PowerPC. If on nvptx+PowerPC, link that testcase
as well – but XFAIL it.

OK? Or do you/does anyone have a better idea?

Tobias

PS: Should I open a PR for the missing __sync_val_compare_and_swap_16?

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Comments

Tom de Vries Sept. 1, 2020, 12:58 p.m. UTC | #1
On 9/1/20 1:41 PM, Tobias Burnus wrote:
> Hi Tom, hello all,
> 
> it turned out that the testcase fails on PowerPC (but not x86_64)
> as the nvptx lto complains: unresolved symbol
> __sync_val_compare_and_swap_16
> 
> The testcase uses int128 – and that's the culprit, but I have no idea
> why it only fails with PowerPC and not with x86-64.
> 

Well, I'm guessing the explanation is here in omp-expand.c:
...
/* Expand an GIMPLE_OMP_ATOMIC statement.  We try to expand

   using expand_omp_atomic_fetch_op.  If it failed, we try to

   call expand_omp_atomic_pipeline, and if it fails too, the

   ultimate fallback is wrapping the operation in a mutex

   (expand_omp_atomic_mutex).  REGION is the atomic region built

   by build_omp_regions_1().  */

static void
expand_omp_atomic (struct omp_region *region)
...

In the x86_64 case, when doing:
...
$ gcc-11 src/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
-fdump-tree-all-details -fopenmp
...
we get:
...
  <bb 33> :
  D.3382 = .omp_data_i->res;
  GOMP_atomic_start ();
  D.3383 = MEM[(__int128 * {ref-all})D.3382];

  <bb 34> :
  D.3384 = (_Bool) D.3383;
  if (D.3384 != 0)
    goto <bb 35>; [INV]
  else
    goto <bb 37>; [INV]

  <bb 38> :
  MEM[(__int128 * {ref-all})D.3382] = iftmp.80;
  GOMP_atomic_end ();
...
which means we're triggering the "expand_omp_atomic_mutex" case for x86_64.

Apparently we're triggering the "expand_omp_atomic_pipeline" for powerpc.

> Unless someone sees a good way to implement __sync_val_compare_and_swap_16,

Hmm, one could implement it in the compiler using calls to
GOMP_atomic_start/GOMP_atomic_end, but it feels somewhat hacky.

Thanks,
- Tom
Tom de Vries Sept. 2, 2020, 7:56 a.m. UTC | #2
On 9/1/20 2:58 PM, Tom de Vries wrote:
> On 9/1/20 1:41 PM, Tobias Burnus wrote:
>> Hi Tom, hello all,
>>
>> it turned out that the testcase fails on PowerPC (but not x86_64)
>> as the nvptx lto complains: unresolved symbol
>> __sync_val_compare_and_swap_16
>>
>> The testcase uses int128 – and that's the culprit, but I have no idea
>> why it only fails with PowerPC and not with x86-64.
>>
> 

Reproduced on x86_64 using trigger patch:
...
$ git diff
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index ed17bb00205..eccedac192f 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -153,9 +153,15 @@
     (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
TARGET_SSE))")
    ])

+ (define_mode_iterator ATOMIC2
+    [QI HI SI
+     (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
TARGET_SSE))")
+    TI
+    ])
+
 (define_expand "atomic_load<mode>"
-  [(set (match_operand:ATOMIC 0 "nonimmediate_operand")
-       (unspec:ATOMIC [(match_operand:ATOMIC 1 "memory_operand")
+  [(set (match_operand:ATOMIC2 0 "nonimmediate_operand")
+       (unspec:ATOMIC2 [(match_operand:ATOMIC2 1 "memory_operand")
                        (match_operand:SI 2 "const_int_operand")]
                       UNSPEC_LDA))]
   ""
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
index d0e82b04790..62b0e032c33 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-additional-options "-mcx16" } */

 #include <stdlib.h>

...

Thanks,
- Tom
diff mbox series

Patch

libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16

On PowerPC, only, __sync_val_compare_and_swap_16 is generated for the
testcase – which nvptx currently does not support.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/reduction-16.c: Move int128 test to ...
	* testsuite/libgomp.c-c++-common/reduction-17.c: ... this new test.
	* testsuite/libgomp.c-c++-common/reduction-17a.c: New test.

 .../testsuite/libgomp.c-c++-common/reduction-16.c  |  8 +----
 .../testsuite/libgomp.c-c++-common/reduction-17.c  | 41 ++++++++++++++++++++++
 .../testsuite/libgomp.c-c++-common/reduction-17a.c | 11 ++++++
 3 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
index d0e82b04790..02ec4cd36b8 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -1,4 +1,5 @@ 
 /* { dg-do run } */
+/* See also reduction-17.c for an int128 testcase.  */
 
 #include <stdlib.h>
 
@@ -32,9 +33,6 @@  GENERATE_TEST(char)
 GENERATE_TEST(short)
 GENERATE_TEST(int)
 GENERATE_TEST(long)
-#ifdef __SIZEOF_INT128__
-GENERATE_TEST(__int128)
-#endif
 
 int main(void)
 {
@@ -46,8 +44,4 @@  int main(void)
     abort ();
   if (test_long ())
     abort ();
-#ifdef __SIZEOF_INT128__
-  if (test___int128 ())
-    abort ();
-#endif
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-17.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-17.c
new file mode 100644
index 00000000000..44dabf6e139
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-17.c
@@ -0,0 +1,41 @@ 
+/* { dg-do run { target { ! powerpc*-*-linux* } || { ! offload_target_nvptx } } } */
+/* { dg-require-effective-target int128 } */
+
+/* See also reduction-17a.c for powerpc*-*-linux* + offload_target_nvptx.  */
+/* See also reduction-16.c for a char/short/int/long testcase.  */
+
+#include <stdlib.h>
+
+#define N 512
+
+#define GENERATE_TEST(T)	\
+int test_##T (void)		\
+{				\
+  T a[N], res = 0;		\
+				\
+  for (int i = 0; i < N; ++i)	\
+    a[i] = i & 1;		\
+				\
+_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res || a[i];		\
+				\
+  /* res should be non-zero.  */\
+  if (!res)			\
+    return 1;			\
+				\
+_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res && a[i];		\
+				\
+  /* res should be zero.  */	\
+  return res;			\
+}
+
+GENERATE_TEST(__int128)
+
+int main(void)
+{
+  if (test___int128 ())
+    abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-17a.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-17a.c
new file mode 100644
index 00000000000..1e6a7bc990f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-17a.c
@@ -0,0 +1,11 @@ 
+/* Duplicates reduction-17.c as it fails on PowerPC with nvptx offload
+   due to 'unresolved symbol __sync_val_compare_and_swap_16' on the nvptx side.  */
+
+/* { dg-do link { target { powerpc*-*-linux* } && { offload_target_nvptx } } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-xfail-if "__sync_val_compare_and_swap_16" { *-*-* } } */
+
+/* See also reduction-17a.c for either not powerpc*-*-linux* or not offload_target_nvptx.  */
+/* See also reduction-16.c for a char/short/int/long testcase.  */
+
+#include "reduction-17.c"