diff mbox

[ARM,1/2] Load-acquire, store-release atomics in AArch32 ARMv8

Message ID 001901cdff02$28d09c30$7a71d490$@tkachov@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Jan. 30, 2013, 3:54 p.m. UTC
Hi all,

This patch implements the atomic built-ins using the new ARMv8 load-acquire
and store-release instructions.
They allow us to generate barrier-free code for a variety of atomic
operations such as: atomic load, atomic store, atomic compare and swap,
atomic {or, and, add, sub, xor}.

Tests will come in a separate patch soon.
No regressions on arm-none-eabi. Bootstrap on armv7l-unknown-linux-gnueabihf
successful.

Ok for trunk (now or when stage 1 reopens)?

Thanks,
Kyrill

gcc/ChangeLog

2013-01-23  Kyrylo Tkachov  <kyrylo.tkachov at arm.com>

	* config/arm/arm.c (arm_emit_load_exclusive): Add acq parameter.
	Emit load-acquire versions when acq is true.
	(arm_emit_store_exclusive): Add rel parameter.
	Emit store-release versions when rel is true.
	(arm_split_compare_and_swap): Use acquire-release instructions
instead
	of barriers when appropriate.
	(arm_split_atomic_op): Likewise.
	* config/arm/arm.h (TARGET_HAVE_LDACQ): New macro.
	* config/arm/unspecs.md (VUNSPEC_LAX): New unspec.
	(VUNSPEC_SLX): Likewise.
	(VUNSPEC_LDA): Likewise.
	(VUNSPEC_STL): Likewise.
	* config/arm/sync.md (atomic_load<mode>): New pattern.
	(atomic_store<mode>): Likewise.
	(arm_load_acquire_exclusive<mode>): Likewise.
	(arm_load_acquire_exclusivesi): Likewise.
	(arm_load_acquire_exclusivedi): Likewise.
	(arm_store_release_exclusive<mode>): Likewise.

Comments

Kyrylo Tkachov March 22, 2013, 12:09 p.m. UTC | #1
Ping?

http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01441.html

I thought this was ok'd for 4.9 but I can't seem to find the ok email in the
archives.

Thanks,
Kyrill

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Kyrylo Tkachov
> Sent: 30 January 2013 15:55
> To: gcc-patches@gcc.gnu.org
> Cc: Ramana Radhakrishnan; Richard Earnshaw
> Subject: [PATCH][ARM][1/2] Load-acquire, store-release atomics in
> AArch32 ARMv8
> 
> Hi all,
> 
> This patch implements the atomic built-ins using the new ARMv8 load-
> acquire
> and store-release instructions.
> They allow us to generate barrier-free code for a variety of atomic
> operations such as: atomic load, atomic store, atomic compare and swap,
> atomic {or, and, add, sub, xor}.
> 
> Tests will come in a separate patch soon.
> No regressions on arm-none-eabi. Bootstrap on armv7l-unknown-linux-
> gnueabihf
> successful.
> 
> Ok for trunk (now or when stage 1 reopens)?
> 
> Thanks,
> Kyrill
> 
> gcc/ChangeLog
> 
> 2013-01-23  Kyrylo Tkachov  <kyrylo.tkachov at arm.com>
> 
> 	* config/arm/arm.c (arm_emit_load_exclusive): Add acq parameter.
> 	Emit load-acquire versions when acq is true.
> 	(arm_emit_store_exclusive): Add rel parameter.
> 	Emit store-release versions when rel is true.
> 	(arm_split_compare_and_swap): Use acquire-release instructions
> instead
> 	of barriers when appropriate.
> 	(arm_split_atomic_op): Likewise.
> 	* config/arm/arm.h (TARGET_HAVE_LDACQ): New macro.
> 	* config/arm/unspecs.md (VUNSPEC_LAX): New unspec.
> 	(VUNSPEC_SLX): Likewise.
> 	(VUNSPEC_LDA): Likewise.
> 	(VUNSPEC_STL): Likewise.
> 	* config/arm/sync.md (atomic_load<mode>): New pattern.
> 	(atomic_store<mode>): Likewise.
> 	(arm_load_acquire_exclusive<mode>): Likewise.
> 	(arm_load_acquire_exclusivesi): Likewise.
> 	(arm_load_acquire_exclusivedi): Likewise.
> 	(arm_store_release_exclusive<mode>): Likewise.
Richard Earnshaw March 22, 2013, 3:48 p.m. UTC | #2
On 30/01/13 15:54, Kyrylo Tkachov wrote:
> Hi all,
>
> This patch implements the atomic built-ins using the new ARMv8 load-acquire
> and store-release instructions.
> They allow us to generate barrier-free code for a variety of atomic
> operations such as: atomic load, atomic store, atomic compare and swap,
> atomic {or, and, add, sub, xor}.
>
> Tests will come in a separate patch soon.
> No regressions on arm-none-eabi. Bootstrap on armv7l-unknown-linux-gnueabihf
> successful.
>
> Ok for trunk (now or when stage 1 reopens)?
>
> Thanks,
> Kyrill
>
> gcc/ChangeLog
>
> 2013-01-23  Kyrylo Tkachov  <kyrylo.tkachov at arm.com>
>
> 	* config/arm/arm.c (arm_emit_load_exclusive): Add acq parameter.
> 	Emit load-acquire versions when acq is true.
> 	(arm_emit_store_exclusive): Add rel parameter.
> 	Emit store-release versions when rel is true.
> 	(arm_split_compare_and_swap): Use acquire-release instructions
> instead
> 	of barriers when appropriate.
> 	(arm_split_atomic_op): Likewise.
> 	* config/arm/arm.h (TARGET_HAVE_LDACQ): New macro.
> 	* config/arm/unspecs.md (VUNSPEC_LAX): New unspec.
> 	(VUNSPEC_SLX): Likewise.
> 	(VUNSPEC_LDA): Likewise.
> 	(VUNSPEC_STL): Likewise.
> 	* config/arm/sync.md (atomic_load<mode>): New pattern.
> 	(atomic_store<mode>): Likewise.
> 	(arm_load_acquire_exclusive<mode>): Likewise.
> 	(arm_load_acquire_exclusivesi): Likewise.
> 	(arm_load_acquire_exclusivedi): Likewise.
> 	(arm_store_release_exclusive<mode>): Likewise.
>
>
> atomics.txt
>
>
> @@ -26067,6 +26099,15 @@ arm_expand_compare_and_swap (rtx operands[])
>     mod_f = operands[7];
>     mode = GET_MODE (mem);
>
> +  /* Normally the succ memory model must be stronger than fail, but in the
> +   unlikely event of fail being ACQUIRE and succ being RELEASE we need to
> +   promote succ to ACQ_REL so that we don't lose the acquire semantics.  */

Can you double check the indentation here.  The 'u' from 'unlikely' 
should line up with the 'N' from 'Normally'.

Otherwise OK.

R.
Ca
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9d3981d..f989f54 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25999,39 +25999,71 @@  arm_post_atomic_barrier (enum memmodel model)
     emit_insn (gen_memory_barrier ());
 }
 
-/* Emit the load-exclusive and store-exclusive instructions.  */
+/* Emit the load-exclusive and store-exclusive instructions.
+   Use acquire and release versions if necessary.  */
 
 static void
-arm_emit_load_exclusive (enum machine_mode mode, rtx rval, rtx mem)
+arm_emit_load_exclusive (enum machine_mode mode, rtx rval, rtx mem, bool acq)
 {
   rtx (*gen) (rtx, rtx);
 
-  switch (mode)
+  if (acq)
     {
-    case QImode: gen = gen_arm_load_exclusiveqi; break;
-    case HImode: gen = gen_arm_load_exclusivehi; break;
-    case SImode: gen = gen_arm_load_exclusivesi; break;
-    case DImode: gen = gen_arm_load_exclusivedi; break;
-    default:
-      gcc_unreachable ();
+      switch (mode)
+        {
+        case QImode: gen = gen_arm_load_acquire_exclusiveqi; break;
+        case HImode: gen = gen_arm_load_acquire_exclusivehi; break;
+        case SImode: gen = gen_arm_load_acquire_exclusivesi; break;
+        case DImode: gen = gen_arm_load_acquire_exclusivedi; break;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  else
+    {
+      switch (mode)
+        {
+        case QImode: gen = gen_arm_load_exclusiveqi; break;
+        case HImode: gen = gen_arm_load_exclusivehi; break;
+        case SImode: gen = gen_arm_load_exclusivesi; break;
+        case DImode: gen = gen_arm_load_exclusivedi; break;
+        default:
+          gcc_unreachable ();
+        }
     }
 
   emit_insn (gen (rval, mem));
 }
 
 static void
-arm_emit_store_exclusive (enum machine_mode mode, rtx bval, rtx rval, rtx mem)
+arm_emit_store_exclusive (enum machine_mode mode, rtx bval, rtx rval,
+                          rtx mem, bool rel)
 {
   rtx (*gen) (rtx, rtx, rtx);
 
-  switch (mode)
+  if (rel)
     {
-    case QImode: gen = gen_arm_store_exclusiveqi; break;
-    case HImode: gen = gen_arm_store_exclusivehi; break;
-    case SImode: gen = gen_arm_store_exclusivesi; break;
-    case DImode: gen = gen_arm_store_exclusivedi; break;
-    default:
-      gcc_unreachable ();
+      switch (mode)
+        {
+        case QImode: gen = gen_arm_store_release_exclusiveqi; break;
+        case HImode: gen = gen_arm_store_release_exclusivehi; break;
+        case SImode: gen = gen_arm_store_release_exclusivesi; break;
+        case DImode: gen = gen_arm_store_release_exclusivedi; break;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  else
+    {
+      switch (mode)
+        {
+        case QImode: gen = gen_arm_store_exclusiveqi; break;
+        case HImode: gen = gen_arm_store_exclusivehi; break;
+        case SImode: gen = gen_arm_store_exclusivesi; break;
+        case DImode: gen = gen_arm_store_exclusivedi; break;
+        default:
+          gcc_unreachable ();
+        }
     }
 
   emit_insn (gen (bval, rval, mem));
@@ -26067,6 +26099,15 @@  arm_expand_compare_and_swap (rtx operands[])
   mod_f = operands[7];
   mode = GET_MODE (mem);
 
+  /* Normally the succ memory model must be stronger than fail, but in the
+   unlikely event of fail being ACQUIRE and succ being RELEASE we need to
+   promote succ to ACQ_REL so that we don't lose the acquire semantics.  */
+
+  if (TARGET_HAVE_LDACQ
+      && INTVAL (mod_f) == MEMMODEL_ACQUIRE
+      && INTVAL (mod_s) == MEMMODEL_RELEASE)
+    mod_s = GEN_INT (MEMMODEL_ACQ_REL);
+
   switch (mode)
     {
     case QImode:
@@ -26141,7 +26182,19 @@  arm_split_compare_and_swap (rtx operands[])
   scratch = operands[7];
   mode = GET_MODE (mem);
 
-  arm_pre_atomic_barrier (mod_s);
+  bool use_acquire = TARGET_HAVE_LDACQ
+                     && !(mod_s == MEMMODEL_RELAXED
+                          || mod_s == MEMMODEL_CONSUME
+                          || mod_s == MEMMODEL_RELEASE);
+
+  bool use_release = TARGET_HAVE_LDACQ
+                     && !(mod_s == MEMMODEL_RELAXED
+                          || mod_s == MEMMODEL_CONSUME
+                          || mod_s == MEMMODEL_ACQUIRE);
+
+  /* Checks whether a barrier is needed and emits one accordingly.  */
+  if (!(use_acquire || use_release))
+    arm_pre_atomic_barrier (mod_s);
 
   label1 = NULL_RTX;
   if (!is_weak)
@@ -26151,7 +26204,7 @@  arm_split_compare_and_swap (rtx operands[])
     }
   label2 = gen_label_rtx ();
 
-  arm_emit_load_exclusive (mode, rval, mem);
+  arm_emit_load_exclusive (mode, rval, mem, use_acquire);
 
   cond = arm_gen_compare_reg (NE, rval, oldval, scratch);
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
@@ -26159,7 +26212,7 @@  arm_split_compare_and_swap (rtx operands[])
 			    gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   emit_unlikely_jump (gen_rtx_SET (VOIDmode, pc_rtx, x));
 
-  arm_emit_store_exclusive (mode, scratch, mem, newval);
+  arm_emit_store_exclusive (mode, scratch, mem, newval, use_release);
 
   /* Weak or strong, we want EQ to be true for success, so that we
      match the flags that we got from the compare above.  */
@@ -26178,7 +26231,9 @@  arm_split_compare_and_swap (rtx operands[])
   if (mod_f != MEMMODEL_RELAXED)
     emit_label (label2);
 
-  arm_post_atomic_barrier (mod_s);
+  /* Checks whether a barrier is needed and emits one accordingly.  */
+  if (!(use_acquire || use_release))
+    arm_post_atomic_barrier (mod_s);
 
   if (mod_f == MEMMODEL_RELAXED)
     emit_label (label2);
@@ -26193,7 +26248,19 @@  arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   enum machine_mode wmode = (mode == DImode ? DImode : SImode);
   rtx label, x;
 
-  arm_pre_atomic_barrier (model);
+  bool use_acquire = TARGET_HAVE_LDACQ
+                     && !(model == MEMMODEL_RELAXED
+                          || model == MEMMODEL_CONSUME
+                          || model == MEMMODEL_RELEASE);
+
+  bool use_release = TARGET_HAVE_LDACQ
+                     && !(model == MEMMODEL_RELAXED
+                          || model == MEMMODEL_CONSUME
+                          || model == MEMMODEL_ACQUIRE);
+
+  /* Checks whether a barrier is needed and emits one accordingly.  */
+  if (!(use_acquire || use_release))
+    arm_pre_atomic_barrier (model);
 
   label = gen_label_rtx ();
   emit_label (label);
@@ -26206,7 +26273,7 @@  arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
     old_out = new_out;
   value = simplify_gen_subreg (wmode, value, mode, 0);
 
-  arm_emit_load_exclusive (mode, old_out, mem);
+  arm_emit_load_exclusive (mode, old_out, mem, use_acquire);
 
   switch (code)
     {
@@ -26254,12 +26321,15 @@  arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
       break;
     }
 
-  arm_emit_store_exclusive (mode, cond, mem, gen_lowpart (mode, new_out));
+  arm_emit_store_exclusive (mode, cond, mem, gen_lowpart (mode, new_out),
+                            use_release);
 
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
   emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label));
 
-  arm_post_atomic_barrier (model);
+  /* Checks whether a barrier is needed and emits one accordingly.  */
+  if (!(use_acquire || use_release))
+    arm_post_atomic_barrier (model);
 }
 
 #define MAX_VECT_LEN 16
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 6d336e8..9ef29a8 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -350,6 +350,9 @@  extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_LDREXD	(((arm_arch6k && TARGET_ARM) || arm_arch7) \
 				 && arm_arch_notm)
 
+/* Nonzero if this chip supports load-acquire and store-release.  */
+#define TARGET_HAVE_LDACQ	(TARGET_ARM_ARCH >= 8)
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
 				 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 281e51e..9802348 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -65,6 +65,42 @@ 
    (set_attr "conds" "unconditional")
    (set_attr "predicable" "no")])
 
+(define_insn "atomic_load<mode>"
+  [(set (match_operand:QHSI 0 "register_operand" "=r")
+    (unspec_volatile:QHSI
+      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q")
+       (match_operand:SI 2 "const_int_operand")]		;; model
+      VUNSPEC_LDA))]
+  "TARGET_HAVE_LDACQ"
+  {
+    enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+    if (model == MEMMODEL_RELAXED
+        || model == MEMMODEL_CONSUME
+        || model == MEMMODEL_RELEASE)
+      return \"ldr<sync_sfx>\\t%0, %1\";
+    else
+      return \"lda<sync_sfx>\\t%0, %1\";
+  }
+)
+
+(define_insn "atomic_store<mode>"
+  [(set (match_operand:QHSI 0 "memory_operand" "=Q")
+    (unspec_volatile:QHSI
+      [(match_operand:QHSI 1 "general_operand" "r")
+       (match_operand:SI 2 "const_int_operand")]		;; model
+      VUNSPEC_STL))]
+  "TARGET_HAVE_LDACQ"
+  {
+    enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+    if (model == MEMMODEL_RELAXED
+        || model == MEMMODEL_CONSUME
+        || model == MEMMODEL_ACQUIRE)
+      return \"str<sync_sfx>\t%1, %0\";
+    else
+      return \"stl<sync_sfx>\t%1, %0\";
+  }
+)
+
 ;; Note that ldrd and vldr are *not* guaranteed to be single-copy atomic,
 ;; even for a 64-bit aligned address.  Instead we use a ldrexd unparied
 ;; with a store.
@@ -327,6 +363,16 @@ 
   "ldrex<sync_sfx>%?\t%0, %C1"
   [(set_attr "predicable" "yes")])
 
+(define_insn "arm_load_acquire_exclusive<mode>"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+        (zero_extend:SI
+	  (unspec_volatile:NARROW
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
+	    VUNSPEC_LAX)))]
+  "TARGET_HAVE_LDACQ"
+  "ldaex<sync_sfx>%?\\t%0, %C1"
+  [(set_attr "predicable" "yes")])
+
 (define_insn "arm_load_exclusivesi"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec_volatile:SI
@@ -336,6 +382,15 @@ 
   "ldrex%?\t%0, %C1"
   [(set_attr "predicable" "yes")])
 
+(define_insn "arm_load_acquire_exclusivesi"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec_volatile:SI
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
+	  VUNSPEC_LAX))]
+  "TARGET_HAVE_LDACQ"
+  "ldaex%?\t%0, %C1"
+  [(set_attr "predicable" "yes")])
+
 (define_insn "arm_load_exclusivedi"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec_volatile:DI
@@ -345,6 +400,15 @@ 
   "ldrexd%?\t%0, %H0, %C1"
   [(set_attr "predicable" "yes")])
 
+(define_insn "arm_load_acquire_exclusivedi"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+	(unspec_volatile:DI
+	  [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
+	  VUNSPEC_LAX))]
+  "TARGET_HAVE_LDACQ && ARM_DOUBLEWORD_ALIGN"
+  "ldaexd%?\t%0, %H0, %C1"
+  [(set_attr "predicable" "yes")])
+
 (define_insn "arm_store_exclusive<mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
 	(unspec_volatile:SI [(const_int 0)] VUNSPEC_SC))
@@ -368,3 +432,31 @@ 
     return "strex<sync_sfx>%?\t%0, %2, %C1";
   }
   [(set_attr "predicable" "yes")])
+
+(define_insn "arm_store_release_exclusivedi"
+  [(set (match_operand:SI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
+   (set (match_operand:DI 1 "mem_noofs_operand" "=Ua")
+	(unspec_volatile:DI
+	  [(match_operand:DI 2 "s_register_operand" "r")]
+	  VUNSPEC_SLX))]
+  "TARGET_HAVE_LDACQ && ARM_DOUBLEWORD_ALIGN"
+  {
+    rtx value = operands[2];
+    /* See comment in arm_store_exclusive<mode> above.  */
+    gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
+    operands[3] = gen_rtx_REG (SImode, REGNO (value) + 1);
+    return "stlexd%?\t%0, %2, %3, %C1";
+  }
+  [(set_attr "predicable" "yes")])
+
+(define_insn "arm_store_release_exclusive<mode>"
+  [(set (match_operand:SI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
+   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua")
+	(unspec_volatile:QHSI
+	  [(match_operand:QHSI 2 "s_register_operand" "r")]
+	  VUNSPEC_SLX))]
+  "TARGET_HAVE_LDACQ"
+  "stlex<sync_sfx>%?\t%0, %2, %C1"
+  [(set_attr "predicable" "yes")])
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 3985912..508603c 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -139,6 +139,10 @@ 
   VUNSPEC_ATOMIC_OP	; Represent an atomic operation.
   VUNSPEC_LL		; Represent a load-register-exclusive.
   VUNSPEC_SC		; Represent a store-register-exclusive.
+  VUNSPEC_LAX		; Represent a load-register-acquire-exclusive.
+  VUNSPEC_SLX		; Represent a store-register-release-exclusive.
+  VUNSPEC_LDA		; Represent a store-register-acquire.
+  VUNSPEC_STL		; Represent a store-register-release.
 ])
 
 ;; Enumerators for NEON unspecs.