diff mbox

[ARM,2/7,ping2] Adapt atomic and exclusive load and store to ARMv8-M Baseline

Message ID 5980c8f3-4546-9169-2ded-f06b4a88b766@foss.arm.com
State New
Headers show

Commit Message

Thomas Preudhomme Oct. 14, 2016, 1:48 p.m. UTC
Ping?

Best regards,

Thomas

On 03/10/16 17:42, Thomas Preudhomme wrote:
> Ping?
>
> Best regards,
>
> Thomas
>
> On 22/09/16 14:41, Thomas Preudhomme wrote:
>> Hi,
>>
>> This patch is part of a patch series to add support for atomic operations on
>> ARMv8-M Baseline targets in GCC. This specific patch adapts atomic and exclusive
>> load and store patterns to the constraints of ARMv8-M Baseline. It consists of
>> two sets of changes:
>>
>> - adding non predicated output templates because ARMv8-M Baseline does not have
>> IT instruction
>> - use low registers for ldr/str
>>
>> Together these changes require to create 2 new alternatives for atomic_load and
>> atomic_store: (i) one for relaxed, consume and release memory model (the new Pf
>> constraint) where ldr/str are used and thus low registers must be used and (ii)
>> another one for the other memory model where lda/stl are used. These are
>> separate from the constraint for 32bit targets whose output templates expect
>> predication.
>>
>> ChangeLog entry is as follows:
>>
>> *** gcc/ChangeLog ***
>>
>> 2016-07-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>         * config/arm/constraints.md (Q constraint): Document its use for
>>         Thumb-1.
>>         (Pf constraint): New constraint for relaxed, consume or relaxed memory
>>         models.
>>         * config/arm/sync.md (atomic_load<mode>): Add new ARMv8-M Baseline only
>>         alternatives to allow any register when memory model matches Pf and
>>         thus lda is used, but only low registers otherwise.  Use unpredicated
>>         output template for Thumb-1 targets.
>>         (atomic_store<mode>): Likewise for stl.
>>         (arm_load_exclusive<mode>): Add new ARMv8-M Baseline only alternative
>>         whose output template does not have predication.
>>         (arm_load_acquire_exclusive<mode>): Likewise.
>>         (arm_load_exclusivesi): Likewise.
>>         (arm_load_acquire_exclusivesi): Likewise.
>>         (arm_store_release_exclusive<mode>): Likewise.
>>         (arm_store_exclusive<mode>): Use unpredicated output template for
>>         Thumb-1 targets.
>>
>>
>> Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all
>> atomic and synchronization testcases in the testsuite [2]. Patchset was also
>> bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at
>> optimization level -O1 and above [1] without any regression in the testsuite and
>> no code generation difference in libitm and libgomp.
>>
>> Code generation for ARMv8-M Baseline has been manually examined and compared
>> against ARMv8-A Thumb-2 for the following configuration without finding any
>> issue:
>>
>> gcc.dg/atomic-op-2.c at -Os
>> gcc.dg/atomic-compare-exchange-2.c at -Os
>> gcc.dg/atomic-compare-exchange-3.c at -O3
>>
>>
>> Is this ok for trunk?
>>
>> Best regards,
>>
>> Thomas
>>
>> [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and
>> undefined ("-O2 -g")
>> [2] The exact list is:
>>
>> gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c
>> gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c
>> gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c
>> gcc/testsuite/gcc.dg/atomic-exchange-1.c
>> gcc/testsuite/gcc.dg/atomic-exchange-2.c
>> gcc/testsuite/gcc.dg/atomic-exchange-3.c
>> gcc/testsuite/gcc.dg/atomic-fence.c
>> gcc/testsuite/gcc.dg/atomic-flag.c
>> gcc/testsuite/gcc.dg/atomic-generic.c
>> gcc/testsuite/gcc.dg/atomic-generic-aux.c
>> gcc/testsuite/gcc.dg/atomic-invalid-2.c
>> gcc/testsuite/gcc.dg/atomic-load-1.c
>> gcc/testsuite/gcc.dg/atomic-load-2.c
>> gcc/testsuite/gcc.dg/atomic-load-3.c
>> gcc/testsuite/gcc.dg/atomic-lockfree.c
>> gcc/testsuite/gcc.dg/atomic-lockfree-aux.c
>> gcc/testsuite/gcc.dg/atomic-noinline.c
>> gcc/testsuite/gcc.dg/atomic-noinline-aux.c
>> gcc/testsuite/gcc.dg/atomic-op-1.c
>> gcc/testsuite/gcc.dg/atomic-op-2.c
>> gcc/testsuite/gcc.dg/atomic-op-3.c
>> gcc/testsuite/gcc.dg/atomic-op-6.c
>> gcc/testsuite/gcc.dg/atomic-store-1.c
>> gcc/testsuite/gcc.dg/atomic-store-2.c
>> gcc/testsuite/gcc.dg/atomic-store-3.c
>> gcc/testsuite/g++.dg/ext/atomic-1.C
>> gcc/testsuite/g++.dg/ext/atomic-2.C
>> gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c
>> gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c
>> gcc/testsuite/gcc.target/arm/atomic-op-acquire.c
>> gcc/testsuite/gcc.target/arm/atomic-op-char.c
>> gcc/testsuite/gcc.target/arm/atomic-op-consume.c
>> gcc/testsuite/gcc.target/arm/atomic-op-int.c
>> gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c
>> gcc/testsuite/gcc.target/arm/atomic-op-release.c
>> gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c
>> gcc/testsuite/gcc.target/arm/atomic-op-short.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
>> gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c
>> gcc/testsuite/gcc.target/arm/sync-1.c
>> gcc/testsuite/gcc.target/arm/synchronize.c
>> gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
>> gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
>> gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
>> gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
>> libstdc++-v3/testsuite/29_atomics/atomic/60658.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/62259.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/64658.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/65147.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/65913.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/70766.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/cons/default.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/cons/direct_list.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/cons/single_value.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/cons/user_pod.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/operators/51811.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/operators/56011.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_assignment.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_conversion.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/base_classes.cc
>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/compare_exchange_lowering.cc
>>
>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/explicit_instantiation/1.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/1.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/56012.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/aggregate.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/default.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/standard_layout.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/trivial.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/60940.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/65147.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/constexpr.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/copy_list.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/default.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/direct_list.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/single_value.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/bitwise.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/decrement.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/increment.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_assignment.cc
>>
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_conversion.cc
>>
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/standard_layout.cc
>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/trivial.cc
>> libstdc++-v3/testsuite/29_atomics/headers/atomic/functions_std_c++0x.cc
>> libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
>> libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x.cc
diff mbox

Patch

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 4ece5f013c92adee04157b5c909e1d47c894c994..65098ceeb1a66174b345bcfb0688152f9f137150 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -34,11 +34,13 @@ 
 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Do, Dv, Dy, Di, Dt, Dp, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
+;; in all states: Pf
 
 ;; The following memory constraints have been used:
-;; in ARM/Thumb-2 state: Q, Uh, Ut, Uv, Uy, Un, Um, Us
+;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
 ;; in Thumb state: Uu, Uw
+;; in all states: Q
 
 
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
@@ -180,6 +182,13 @@ 
   (and (match_code "const_int")
        (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
 
+(define_constraint "Pf"
+  "Memory models except relaxed, consume or release ones."
+  (and (match_code "const_int")
+       (match_test "!is_mm_relaxed (memmodel_from_int (ival))
+		    && !is_mm_consume (memmodel_from_int (ival))
+		    && !is_mm_release (memmodel_from_int (ival))")))
+
 (define_constraint "Ps"
   "@internal In Thumb-2 state a constant in the range -255 to +255"
   (and (match_code "const_int")
@@ -407,7 +416,7 @@ 
 
 (define_memory_constraint "Q"
  "@internal
-  In ARM/Thumb-2 state an address that is a single base register."
+  An address that is a single base register."
  (and (match_code "mem")
       (match_test "REG_P (XEXP (op, 0))")))
 
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index d10ede4175f94e627a23bf32d19d2b5f3de76771..d36c24f76f670d7602f766d7172286504faa7af5 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -63,37 +63,59 @@ 
    (set_attr "predicable" "no")])
 
 (define_insn "atomic_load<mode>"
-  [(set (match_operand:QHSI 0 "register_operand" "=r")
+  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q")
-       (match_operand:SI 2 "const_int_operand")]		;; model
+      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
+       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
       VUNSPEC_LDA))]
   "TARGET_HAVE_LDACQ"
   {
     enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
     if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model))
-      return \"ldr<sync_sfx>%?\\t%0, %1\";
+      {
+	if (TARGET_THUMB1)
+	  return \"ldr<sync_sfx>\\t%0, %1\";
+	else
+	  return \"ldr<sync_sfx>%?\\t%0, %1\";
+      }
     else
-      return \"lda<sync_sfx>%?\\t%0, %1\";
+      {
+	if (TARGET_THUMB1)
+	  return \"lda<sync_sfx>\\t%0, %1\";
+	else
+	  return \"lda<sync_sfx>%?\\t%0, %1\";
+      }
   }
-  [(set_attr "predicable" "yes")
+  [(set_attr "arch" "32,v8mb,any")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "atomic_store<mode>"
-  [(set (match_operand:QHSI 0 "memory_operand" "=Q")
+  [(set (match_operand:QHSI 0 "memory_operand" "=Q,Q,Q")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "general_operand" "r")
-       (match_operand:SI 2 "const_int_operand")]		;; model
+      [(match_operand:QHSI 1 "general_operand" "r,r,l")
+       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
       VUNSPEC_STL))]
   "TARGET_HAVE_LDACQ"
   {
     enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
     if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model))
-      return \"str<sync_sfx>%?\t%1, %0\";
+      {
+	if (TARGET_THUMB1)
+	  return \"str<sync_sfx>\t%1, %0\";
+	else
+	  return \"str<sync_sfx>%?\t%1, %0\";
+      }
     else
-      return \"stl<sync_sfx>%?\t%1, %0\";
+      {
+	if (TARGET_THUMB1)
+	  return \"stl<sync_sfx>\t%1, %0\";
+	else
+	  return \"stl<sync_sfx>%?\t%1, %0\";
+      }
   }
-  [(set_attr "predicable" "yes")
+  [(set_attr "arch" "32,v8mb,any")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 ;; An LDRD instruction usable by the atomic_loaddi expander on LPAE targets
@@ -380,45 +402,57 @@ 
   })
 
 (define_insn "arm_load_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
 	    VUNSPEC_LL)))]
   "TARGET_HAVE_LDREXBH"
-  "ldrex<sync_sfx>%?\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldrex<sync_sfx>%?\t%0, %C1
+   ldrex<sync_sfx>\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_acquire_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
 	    VUNSPEC_LAX)))]
   "TARGET_HAVE_LDACQ"
-  "ldaex<sync_sfx>%?\\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldaex<sync_sfx>%?\\t%0, %C1
+   ldaex<sync_sfx>\\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
 	  VUNSPEC_LL))]
   "TARGET_HAVE_LDREX"
-  "ldrex%?\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldrex%?\t%0, %C1
+   ldrex\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_acquire_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
 	  VUNSPEC_LAX))]
   "TARGET_HAVE_LDACQ"
-  "ldaex%?\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldaex%?\t%0, %C1
+   ldaex\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_exclusivedi"
@@ -460,7 +494,10 @@ 
 	gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
 	return "strexd%?\t%0, %2, %H2, %C1";
       }
-    return "strex<sync_sfx>%?\t%0, %2, %C1";
+    if (TARGET_THUMB1)
+      return "strex<sync_sfx>\t%0, %2, %C1";
+    else
+      return "strex<sync_sfx>%?\t%0, %2, %C1";
   }
   [(set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
@@ -482,13 +519,16 @@ 
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_store_release_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
+  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
 	(unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
-   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua")
+   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua,Ua")
 	(unspec_volatile:QHSI
-	  [(match_operand:QHSI 2 "s_register_operand" "r")]
+	  [(match_operand:QHSI 2 "s_register_operand" "r,r")]
 	  VUNSPEC_SLX))]
   "TARGET_HAVE_LDACQ"
-  "stlex<sync_sfx>%?\t%0, %2, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   stlex<sync_sfx>%?\t%0, %2, %C1
+   stlex<sync_sfx>\t%0, %2, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])