diff mbox series

[nvptx] Handle memmodel for atomic ops

Message ID 20210517154930.GA9410@delia
State New
Headers show
Series [nvptx] Handle memmodel for atomic ops | expand

Commit Message

Tom de Vries May 17, 2021, 3:49 p.m. UTC
Hi,

[ Tobias, can you test this on volta ? ]

The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.

Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.

Tested libgomp on x86_64-linux with nvptx accelerator.

Any comments?

Thanks,
- Tom

[nvptx] Handle memmodel for atomic ops

gcc/ChangeLog:

2021-05-17  Tom de Vries  <tdevries@suse.de>

	PR target/100497
	* config/nvptx/nvptx-protos.h (nvptx_output_atomic_insn): Declare
	* config/nvptx/nvptx.c (nvptx_output_barrier)
	(nvptx_output_atomic_insn): New function.
	(nvptx_print_operand): Add support for 'B'.
	* config/nvptx/nvptx.md: Use nvptx_output_atomic_insn for atomic
	insns.

---
 gcc/config/nvptx/nvptx-protos.h |  1 +
 gcc/config/nvptx/nvptx.c        | 77 +++++++++++++++++++++++++++++++++++++++++
 gcc/config/nvptx/nvptx.md       | 31 ++++++++++++++---
 3 files changed, 104 insertions(+), 5 deletions(-)

Comments

Tobias Burnus May 17, 2021, 4:47 p.m. UTC | #1
On 17.05.21 17:49, Tom de Vries wrote:
> [ Tobias, can you test this on volta ? ]

Unfortunately, it does not seem to help. On a non-Volta system, it still
works (run time 0.3s) but on a Volta system it fails after 1.5s (abort).

Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see
   @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
with no prior membar in GOMP_atomic_start. Likewise with
nvptx-none/lib/libgomp.a which has
   atom.global.exch.b32 %r22,[atomic_lock],1;
I thought a barrier would show up there?

> The atomic ops in nvptx.md have memmodel arguments, which are currently
> ignored.
> Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
> on volta.

Is there a reason that PR target/96932 isn't listed in the
ChangeLog? Or is it supposed that the barrier does not show up
at GOMP_atomic_start (as it doesn't) and it should show up elsewhere
and still help with those two testcases?

Sorry for not having better news. (Unless I messed up and it is an
issue on my side - but it doesn't look like.)

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
Tom de Vries May 17, 2021, 4:58 p.m. UTC | #2
On 5/17/21 6:47 PM, Tobias Burnus wrote:
> On 17.05.21 17:49, Tom de Vries wrote:
>> [ Tobias, can you test this on volta ? ]
> 
> Unfortunately, it does not seem to help. On a non-Volta system, it still
> works (run time 0.3s) but on a Volta system it fails after 1.5s (abort).
> 
> Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see
>  @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
> with no prior membar in GOMP_atomic_start.

I have:
...
@ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
@ %r25 membar.sys;
...

> Likewise with
> nvptx-none/lib/libgomp.a which has
>  atom.global.exch.b32 %r22,[atomic_lock],1;
> I thought a barrier would show up there?
> 

and:
...
atom.global.exch.b32 %r22,[atomic_lock],1;
membar.sys;
...

So both look as expect.

>> The atomic ops in nvptx.md have memmodel arguments, which are currently
>> ignored.
>> Handle these, fixing test-case fails
>> libgomp.c-c++-common/reduction-{5,6}.c
>> on volta.
> 
> Is there a reason that PR target/96932 isn't listed in the
> ChangeLog?

Just that I'm going to mark it a duplicate when this is fixed.

> Or is it supposed that the barrier does not show up
> at GOMP_atomic_start (as it doesn't) and it should show up elsewhere
> and still help with those two testcases?
> 

Nope, it should show up.

> Sorry for not having better news. (Unless I messed up and it is an
> issue on my side - but it doesn't look like.)

Well yes, it's possible that the patch somehow does not work, but then
you'll need to investigate why that is.

Thanks,
- Tom
Tobias Burnus May 17, 2021, 6:10 p.m. UTC | #3
Hi Tom,

short version works :-)
now (and I don't know why it didn't). Long version:

On 17.05.21 18:58, Tom de Vries wrote:
> I have:
> ...
> @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
> @ %r25 membar.sys;
> ...

I tried with the PowerPC cross build - running on
PowerPC + Quadro GV100 did work for both reduction-{5,6}.c
and I can confirm that this line does appear in all four
nvptx libgomp.a (x86->power, power->power and lib/ vs. lib/mgomp/)

On the x86-64 build, which did fail before and where I didn't spot that
line, it now also shows up now (after 'rm -rf' all files) – I wonder why
it didn't work before; the script should have deleted the directory
before compiling the files.
In any case, it is now works there as well with non-Volta 'Tesla K80'
and with 'TITAN V'.

Sorry for the red herring and thanks for the patch!

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
diff mbox series

Patch

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 15122096487..b7e6ae26522 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -57,5 +57,6 @@  extern const char *nvptx_output_set_softstack (unsigned);
 extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
 extern const char *nvptx_output_simt_exit (rtx);
 extern const char *nvptx_output_red_partition (rtx, rtx);
+extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index ebbfa921589..722b0faa330 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2444,6 +2444,53 @@  nvptx_output_mov_insn (rtx dst, rtx src)
   return "%.\tcvt%t0%t1\t%0, %1;";
 }
 
+/* Output a pre/post barrier for MEM_OPERAND according to MEMMODEL.  */
+
+static void
+nvptx_output_barrier (rtx *mem_operand, int memmodel, bool pre_p)
+{
+  bool post_p = !pre_p;
+
+  switch (memmodel)
+    {
+    case MEMMODEL_RELAXED:
+      return;
+    case MEMMODEL_CONSUME:
+    case MEMMODEL_ACQUIRE:
+    case MEMMODEL_SYNC_ACQUIRE:
+      if (post_p)
+	break;
+      return;
+    case MEMMODEL_RELEASE:
+    case MEMMODEL_SYNC_RELEASE:
+      if (pre_p)
+	break;
+      return;
+    case MEMMODEL_ACQ_REL:
+    case MEMMODEL_SEQ_CST:
+    case MEMMODEL_SYNC_SEQ_CST:
+      if (pre_p || post_p)
+	break;
+      return;
+    default:
+      gcc_unreachable ();
+    }
+
+  output_asm_insn ("%.\tmembar%B0;", mem_operand);
+}
+
+const char *
+nvptx_output_atomic_insn (const char *asm_template, rtx *operands, int mem_pos,
+			  int memmodel_pos)
+{
+  nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]),
+			true);
+  output_asm_insn (asm_template, operands);
+  nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]),
+			false);
+  return "";
+}
+
 static void nvptx_print_operand (FILE *, rtx, int);
 
 /* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
@@ -2660,6 +2707,36 @@  nvptx_print_operand (FILE *file, rtx x, int code)
 
   switch (code)
     {
+    case 'B':
+      if (SYMBOL_REF_P (XEXP (x, 0)))
+	switch (SYMBOL_DATA_AREA (XEXP (x, 0)))
+	  {
+	  case DATA_AREA_GENERIC:
+	    /* Assume worst-case: global.  */
+	    gcc_fallthrough (); /* FALLTHROUGH.  */
+	  case DATA_AREA_GLOBAL:
+	    break;
+	  case DATA_AREA_SHARED:
+	    fputs (".cta", file);
+	    return;
+	  case DATA_AREA_LOCAL:
+	  case DATA_AREA_CONST:
+	  case DATA_AREA_PARAM:
+	  default:
+	    gcc_unreachable ();
+	  }
+
+      /* There are 2 cases where membar.sys differs from membar.gl:
+	 - host accesses global memory (f.i. systemwide atomics)
+	 - 2 or more devices are setup in peer-to-peer mode, and one
+	   peer can access global memory of other peer.
+	 Neither are currently supported by openMP/OpenACC on nvptx, but
+	 that could change, so we default to membar.sys.  We could support
+	 this more optimally by adding DATA_AREA_SYS and then emitting
+	 .gl for DATA_AREA_GLOBAL and .sys for DATA_AREA_SYS.  */
+      fputs (".sys", file);
+      return;
+
     case 'A':
       x = XEXP (x, 0);
       gcc_fallthrough (); /* FALLTHROUGH. */
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 00bb8fea821..108de1c0c59 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1642,7 +1642,11 @@ 
    (set (match_dup 1)
 	(unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))]
   ""
-  "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;"
+  {
+    const char *t
+      = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;";
+    return nvptx_output_atomic_insn (t, operands, 1, 4);
+  }
   [(set_attr "atomic" "true")])
 
 (define_insn "atomic_exchange<mode>"
@@ -1654,7 +1658,11 @@ 
    (set (match_dup 1)
 	(match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))]	;; input
   ""
-  "%.\\tatom%A1.exch.b%T0\\t%0, %1, %2;"
+  {
+    const char *t
+      = "%.\tatom%A1.exch.b%T0\t%0, %1, %2;";
+    return nvptx_output_atomic_insn (t, operands, 1, 3);
+  }
   [(set_attr "atomic" "true")])
 
 (define_insn "atomic_fetch_add<mode>"
@@ -1667,7 +1675,11 @@ 
    (set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
 	(match_dup 1))]
   ""
-  "%.\\tatom%A1.add%t0\\t%0, %1, %2;"
+  {
+    const char *t
+      = "%.\\tatom%A1.add%t0\\t%0, %1, %2;";
+    return nvptx_output_atomic_insn (t, operands, 1, 3);
+  }
   [(set_attr "atomic" "true")])
 
 (define_insn "atomic_fetch_addsf"
@@ -1680,7 +1692,11 @@ 
    (set (match_operand:SF 0 "nvptx_register_operand" "=R")
 	(match_dup 1))]
   ""
-  "%.\\tatom%A1.add%t0\\t%0, %1, %2;"
+  {
+    const char *t
+      = "%.\\tatom%A1.add%t0\\t%0, %1, %2;";
+    return nvptx_output_atomic_insn (t, operands, 1, 3);
+  }
   [(set_attr "atomic" "true")])
 
 (define_code_iterator any_logic [and ior xor])
@@ -1696,7 +1712,12 @@ 
    (set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
 	(match_dup 1))]
   "<MODE>mode == SImode || TARGET_SM35"
-  "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;"
+  {
+    const char *t
+      = "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;";
+    return nvptx_output_atomic_insn (t, operands, 1, 3);
+  }
+
   [(set_attr "atomic" "true")])
 
 (define_expand "atomic_test_and_set"