diff mbox series

ARM: Block predication on atomics [PR111235]

Message ID PAWPR08MB8982A6AA40749B74CAD14C5783EEA@PAWPR08MB8982.eurprd08.prod.outlook.com
State New
Headers show
Series ARM: Block predication on atomics [PR111235] | expand

Commit Message

Wilco Dijkstra Sept. 7, 2023, 2:06 p.m. UTC
The v7 memory ordering model allows reordering of conditional atomic instructions.
To avoid this, make all atomic patterns unconditional.  Expand atomic loads and
stores for all architectures so the memory access can be wrapped into an UNSPEC.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog/
        PR target/111235
	* config/arm/constraints.md: Remove Pf constraint.
        * onfig/arm/sync.md (arm_atomic_load<mode>): Add new pattern.
        (arm_atomic_load_acquire<mode>): Likewise.
        (arm_atomic_store<mode>): Likewise.
        (arm_atomic_store_release<mode>): Likewise.
        (atomic_load<mode>): Always expand atomic loads explicitly.
        (atomic_store<mode>): Always expand atomic stores explicitly.
        (arm_atomic_loaddi2_ldrd): Remove predication.
        (arm_load_exclusive<mode>): Likewise.
        (arm_load_acquire_exclusive<mode>): Likewise.
        (arm_load_exclusivesi): Likewise.
        (arm_load_acquire_exclusivesi: Likewise.
        (arm_load_exclusivedi): Likewise.
        (arm_load_acquire_exclusivedi): Likewise.
        (arm_store_exclusive<mode>): Likewise.
        (arm_store_release_exclusivedi): Likewise.
        (arm_store_release_exclusive<mode>): Likewise.
        * gcc/config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR.

gcc/testsuite/ChangeLog/
        PR target/111235
	* gcc.target/arm/pr111235.c: Add new test.

---

Comments

Ramana Radhakrishnan Sept. 26, 2023, 10:52 p.m. UTC | #1
Reviewed-by: Ramana Radhakrishnan <ramana@gcc.gnu.org>

A very initial review here . I think it largely looks ok based on the
description but I've spotted a few obvious nits and things that come
to mind on reviewing this. I've not done a very deep review but hope
it helps you move forward. I'm happy to work with you on landing this
if that helps. I'll try and find some time tomorrow to look at this
again.

Hope this helps.


On Thu, Sep 7, 2023 at 3:07 PM Wilco Dijkstra via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> The v7 memory ordering model allows reordering of conditional atomic instructions.
> To avoid this, make all atomic patterns unconditional.  Expand atomic loads and
> stores for all architectures so the memory access can be wrapped into an UNSPEC.

>
> Passes regress/bootstrap, OK for commit?

Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ?
Please be specific.


Since these patterns touch armv8m.baseline can you find all the
testcases in the testsuite and ensure no change in code for
armv8m.baseline as that's unpredicated already and this patch brings
this in line with the same ? Does the testsuite already cover these
arch variants and are you satisfied that the tests in the testsuite
can catch / don't make any additional code changes to the other
architectures affected by this ?


>
> gcc/ChangeLog/
>         PR target/111235
>         * config/arm/constraints.md: Remove Pf constraint.
>         * onfig/arm/sync.md (arm_atomic_load<mode>): Add new pattern.

Nit: s/onfig/config

>         (arm_atomic_load_acquire<mode>): Likewise.
>         (arm_atomic_store<mode>): Likewise.
>         (arm_atomic_store_release<mode>): Likewise.

Ok.

>         (atomic_load<mode>): Always expand atomic loads explicitly.
>         (atomic_store<mode>): Always expand atomic stores explicitly.

Nit: Change message to :

Switch patterns to define_expand.

>         (arm_atomic_loaddi2_ldrd): Remove predication.
>         (arm_load_exclusive<mode>): Likewise.
>         (arm_load_acquire_exclusive<mode>): Likewise.
>         (arm_load_exclusivesi): Likewise.
>         (arm_load_acquire_exclusivesi: Likewise.
>         (arm_load_exclusivedi): Likewise.
>         (arm_load_acquire_exclusivedi): Likewise.
>         (arm_store_exclusive<mode>): Likewise.
>         (arm_store_release_exclusivedi): Likewise.
>         (arm_store_release_exclusive<mode>): Likewise.
>         * gcc/config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR.
>
> gcc/testsuite/ChangeLog/
>         PR target/111235
>         * gcc.target/arm/pr111235.c: Add new test.
>

Largely looks ok though I cannot work out tonight if we need more v8-a
or v8m-baseline specific tests for scan-assembler patterns.

Clearly our testsuite doesn't catch it , so perhaps the OP could help
validate this patch with their formal models to see if this fixes
these set of issues and creates no new regressions ? Is that feasible
to do ?

> ---
>
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 05a4ebbdd67601d7b92aa44a619d17634cc69f17..d7c4a1b0cd785f276862048005e6cfa57cdcb20d 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -36,7 +36,7 @@
>  ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
>  ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
>  ;;                  Rg, Ri
> -;; in all states: Pf, Pg
> +;; in all states: Pg
>
>  ;; The following memory constraints have been used:
>  ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
> @@ -239,13 +239,6 @@ (define_constraint "Pe"
>    (and (match_code "const_int")
>         (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
>
> -(define_constraint "Pf"
> -  "Memory models except relaxed, consume or release ones."
> -  (and (match_code "const_int")
> -       (match_test "!is_mm_relaxed (memmodel_from_int (ival))
> -                   && !is_mm_consume (memmodel_from_int (ival))
> -                   && !is_mm_release (memmodel_from_int (ival))")))
> -
>  (define_constraint "Pg"
>    "@internal In Thumb-2 state a constant in range 1 to 32"
>    (and (match_code "const_int")
> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
> index 7626bf3c443285dc63b4c4367b11a879a99c93c6..2210810f67f37ce043b8fdc73b4f21b54c5b1912 100644
> --- a/gcc/config/arm/sync.md
> +++ b/gcc/config/arm/sync.md
> @@ -62,68 +62,110 @@ (define_insn "*memory_barrier"
>     (set_attr "conds" "unconditional")
>     (set_attr "predicable" "no")])
>
> -(define_insn "atomic_load<mode>"
> -  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
> +(define_insn "arm_atomic_load<mode>"
> +  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
>      (unspec_volatile:QHSI
> -      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
> -       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]      ;; model
> +      [(match_operand:QHSI 1 "memory_operand" "m,m")]
]


Remind me again why is it safe to go from the Q constraint to the m
constraint here and everywhere else you've done this ?

> +  ""
> +  "ldr<sync_sfx>\t%0, %1"
> +  [(set_attr "arch" "32,any")])
> +
> +(define_insn "arm_atomic_load_acquire<mode>"
> +  [(set (match_operand:QHSI 0 "register_operand" "=r")
> +    (unspec_volatile:QHSI
> +      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q")]
>        VUNSPEC_LDA))]
>    "TARGET_HAVE_LDACQ"
> -  {
> -    if (aarch_mm_needs_acquire (operands[2]))
> -      {
> -       if (TARGET_THUMB1)
> -         return "lda<sync_sfx>\t%0, %1";
> -       else
> -         return "lda<sync_sfx>%?\t%0, %1";
> -      }
> -    else
> -      {
> -       if (TARGET_THUMB1)
> -         return "ldr<sync_sfx>\t%0, %1";
> -       else
> -         return "ldr<sync_sfx>%?\t%0, %1";
> -      }
> -  }
> -  [(set_attr "arch" "32,v8mb,any")
> -   (set_attr "predicable" "yes")])
> +  "lda<sync_sfx>\t%0, %C1"
> +)
>
> -(define_insn "atomic_store<mode>"
> -  [(set (match_operand:QHSI 0 "memory_operand" "=Q,Q,Q")
> +(define_insn "arm_atomic_store<mode>"
> +  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
> +    (unspec_volatile:QHSI
> +      [(match_operand:QHSI 1 "register_operand" "r,l")]
> +      VUNSPEC_STR))]
> +  ""
> +  "str<sync_sfx>\t%1, %0";
> +  [(set_attr "arch" "32,any")])
> +
> +(define_insn "arm_atomic_store_release<mode>"
> +  [(set (match_operand:QHSI 0 "arm_sync_memory_operand" "=Q")
>      (unspec_volatile:QHSI
> -      [(match_operand:QHSI 1 "general_operand" "r,r,l")
> -       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]      ;; model
> +      [(match_operand:QHSI 1 "register_operand" "r")]
>        VUNSPEC_STL))]
>    "TARGET_HAVE_LDACQ"
> -  {
> -    if (aarch_mm_needs_release (operands[2]))
> -      {
> -       if (TARGET_THUMB1)
> -         return "stl<sync_sfx>\t%1, %0";
> -       else
> -         return "stl<sync_sfx>%?\t%1, %0";
> -      }
> -    else
> -      {
> -       if (TARGET_THUMB1)
> -         return "str<sync_sfx>\t%1, %0";
> -       else
> -         return "str<sync_sfx>%?\t%1, %0";
> -      }
> -  }
> -  [(set_attr "arch" "32,v8mb,any")
> -   (set_attr "predicable" "yes")])
> +  "stl<sync_sfx>\t%1, %C0")
> +
> +
> +(define_expand "atomic_load<mode>"
> +  [(match_operand:QHSI 0 "register_operand")           ;; val out
> +   (match_operand:QHSI 1 "arm_sync_memory_operand")    ;; memory
> +   (match_operand:SI 2 "const_int_operand")]           ;; model
> +  ""
> +{
> +  memmodel model = memmodel_from_int (INTVAL (operands[2]));
> +
> +  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
> +    {
> +      emit_insn (gen_arm_atomic_load_acquire<mode> (operands[0], operands[1]));
> +      DONE;
> +    }
> +
> +  /* The seq_cst model needs a barrier before the load to block reordering with
> +     earlier accesses.  */
> +  if (is_mm_seq_cst (model))
> +    expand_mem_thread_fence (model);
> +
> +  emit_insn (gen_arm_atomic_load<mode> (operands[0], operands[1]));
> +
> +  /* All non-relaxed models need a barrier after the load when load-acquire
> +     instructions are not available.  */
> +  if (!is_mm_relaxed (model))
> +    expand_mem_thread_fence (model);
> +
> +  DONE;
> +})
> +
> +(define_expand "atomic_store<mode>"
> +  [(match_operand:QHSI 0 "arm_sync_memory_operand")    ;; memory
> +   (match_operand:QHSI 1 "register_operand")           ;; store value
> +   (match_operand:SI 2 "const_int_operand")]           ;; model
> +  ""
> +{
> +  memmodel model = memmodel_from_int (INTVAL (operands[2]));
> +
> +  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
> +    {
> +      emit_insn (gen_arm_atomic_store_release<mode> (operands[0], operands[1]));
> +      DONE;
> +    }
> +
> +  /* All non-relaxed models need a barrier after the load when load-acquire
> +     instructions are not available.  */
> +  if (!is_mm_relaxed (model))
> +    expand_mem_thread_fence (model);
> +
> +  emit_insn (gen_arm_atomic_store<mode> (operands[0], operands[1]));
> +
> +  /* The seq_cst model needs a barrier after the store to block reordering with
> +     later accesses.  */
> +  if (is_mm_seq_cst (model))
> +    expand_mem_thread_fence (model);
> +
> +  DONE;
> +})
>
>  ;; An LDRD instruction usable by the atomic_loaddi expander on LPAE targets
>
>  (define_insn "arm_atomic_loaddi2_ldrd"
>    [(set (match_operand:DI 0 "register_operand" "=r")
>         (unspec_volatile:DI
> -         [(match_operand:DI 1 "arm_sync_memory_operand" "Q")]
> +         [(match_operand:DI 1 "memory_operand" "m")]
>             VUNSPEC_LDRD_ATOMIC))]
>    "ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_LPAE"
> -  "ldrd%?\t%0, %H0, %C1"
> -  [(set_attr "predicable" "yes")])
> +  "ldrd\t%0, %H0, %1"
> +)
>
>  ;; There are three ways to expand this depending on the architecture
>  ;; features available.  As for the barriers, a load needs a barrier
> @@ -152,6 +194,11 @@ (define_expand "atomic_loaddi"
>        DONE;
>      }
>
> +  /* The seq_cst model needs a barrier before the load to block reordering with
> +     earlier accesses.  */
> +  if (is_mm_seq_cst (model))
> +    expand_mem_thread_fence (model);
> +
>    /* On LPAE targets LDRD and STRD accesses to 64-bit aligned
>       locations are 64-bit single-copy atomic.  We still need barriers in the
>       appropriate places to implement the ordering constraints.  */
> @@ -160,7 +207,6 @@ (define_expand "atomic_loaddi"
>    else
>      emit_insn (gen_arm_load_exclusivedi (operands[0], operands[1]));
>
> -
>    /* All non-relaxed models need a barrier after the load when load-acquire
>       instructions are not available.  */
>    if (!is_mm_relaxed (model))
> @@ -446,54 +492,42 @@ (define_insn_and_split "atomic_nand_fetch<mode>"
>    [(set_attr "arch" "32,v8mb")])
>
>  (define_insn "arm_load_exclusive<mode>"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>          (zero_extend:SI
>           (unspec_volatile:NARROW
> -           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
> +           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
>             VUNSPEC_LL)))]
>    "TARGET_HAVE_LDREXBH"
> -  "@
> -   ldrex<sync_sfx>%?\t%0, %C1
> -   ldrex<sync_sfx>\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldrex<sync_sfx>\t%0, %C1"
> +)
>
>  (define_insn "arm_load_acquire_exclusive<mode>"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>          (zero_extend:SI
>           (unspec_volatile:NARROW
> -           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
> +           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
>             VUNSPEC_LAX)))]
>    "TARGET_HAVE_LDACQ"
> -  "@
> -   ldaex<sync_sfx>%?\\t%0, %C1
> -   ldaex<sync_sfx>\\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldaex<sync_sfx>\\t%0, %C1"
> +)
>
>  (define_insn "arm_load_exclusivesi"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>         (unspec_volatile:SI
> -         [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
> +         [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LL))]
>    "TARGET_HAVE_LDREX"
> -  "@
> -   ldrex%?\t%0, %C1
> -   ldrex\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldrex\t%0, %C1"
> +)
>
>  (define_insn "arm_load_acquire_exclusivesi"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>         (unspec_volatile:SI
> -         [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
> +         [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LAX))]
>    "TARGET_HAVE_LDACQ"
> -  "@
> -   ldaex%?\t%0, %C1
> -   ldaex\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldaex\t%0, %C1"
> +)
>
>  (define_insn "arm_load_exclusivedi"
>    [(set (match_operand:DI 0 "s_register_operand" "=r")
> @@ -501,8 +535,8 @@ (define_insn "arm_load_exclusivedi"
>           [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LL))]
>    "TARGET_HAVE_LDREXD"
> -  "ldrexd%?\t%0, %H0, %C1"
> -  [(set_attr "predicable" "yes")])
> +  "ldrexd\t%0, %H0, %C1"
> +)
>
>  (define_insn "arm_load_acquire_exclusivedi"
>    [(set (match_operand:DI 0 "s_register_operand" "=r")
> @@ -510,8 +544,8 @@ (define_insn "arm_load_acquire_exclusivedi"
>           [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LAX))]
>    "TARGET_HAVE_LDACQEXD && ARM_DOUBLEWORD_ALIGN"
> -  "ldaexd%?\t%0, %H0, %C1"
> -  [(set_attr "predicable" "yes")])
> +  "ldaexd\t%0, %H0, %C1"
> +)
>
>  (define_insn "arm_store_exclusive<mode>"
>    [(set (match_operand:SI 0 "s_register_operand" "=&r")
> @@ -530,14 +564,11 @@ (define_insn "arm_store_exclusive<mode>"
>            Note that the 1st register always gets the
>            lowest word in memory.  */
>         gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
> -       return "strexd%?\t%0, %2, %H2, %C1";
> +       return "strexd\t%0, %2, %H2, %C1";
>        }
> -    if (TARGET_THUMB1)
> -      return "strex<sync_sfx>\t%0, %2, %C1";
> -    else
> -      return "strex<sync_sfx>%?\t%0, %2, %C1";
> +    return "strex<sync_sfx>\t%0, %2, %C1";
>    }
> -  [(set_attr "predicable" "yes")])
> +)
>
>  (define_insn "arm_store_release_exclusivedi"
>    [(set (match_operand:SI 0 "s_register_operand" "=&r")
> @@ -550,20 +581,16 @@ (define_insn "arm_store_release_exclusivedi"
>    {
>      /* See comment in arm_store_exclusive<mode> above.  */
>      gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
> -    return "stlexd%?\t%0, %2, %H2, %C1";
> +    return "stlexd\t%0, %2, %H2, %C1";
>    }
> -  [(set_attr "predicable" "yes")])
> +)
>
>  (define_insn "arm_store_release_exclusive<mode>"
> -  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=&r")
>         (unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
> -   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua,Ua")
> +   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua")
>         (unspec_volatile:QHSI
> -         [(match_operand:QHSI 2 "s_register_operand" "r,r")]
> +         [(match_operand:QHSI 2 "s_register_operand" "r")]
>           VUNSPEC_SLX))]
>    "TARGET_HAVE_LDACQ"
> -  "@
> -   stlex<sync_sfx>%?\t%0, %2, %C1
> -   stlex<sync_sfx>\t%0, %2, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "stlex<sync_sfx>\t%0, %2, %C1")
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index dccda283573208bd5db4f04c10ae9dbbfdda49dd..ba86ce0992f2b14a5e65ac09784b3fb3539de035 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -221,7 +221,9 @@ (define_c_enum "unspecv" [
>    VUNSPEC_SC           ; Represent a store-register-exclusive.
>    VUNSPEC_LAX          ; Represent a load-register-acquire-exclusive.
>    VUNSPEC_SLX          ; Represent a store-register-release-exclusive.
> -  VUNSPEC_LDA          ; Represent a store-register-acquire.
> +  VUNSPEC_LDR          ; Represent a load-register-relaxed.
> +  VUNSPEC_LDA          ; Represent a load-register-acquire.

Nit: LDA before LDR ? Though I suspect this list can be alphabetically
ordered at another point of time.

> +  VUNSPEC_STR          ; Represent a store-register-relaxed.
>    VUNSPEC_STL          ; Represent a store-register-release.

Nit : STL before STR ? Lets atleast do an insertion sort here :)

>    VUNSPEC_GET_FPSCR    ; Represent fetch of FPSCR content.
>    VUNSPEC_SET_FPSCR    ; Represent assign of FPSCR content.
> diff --git a/gcc/testsuite/gcc.target/arm/pr111235.c b/gcc/testsuite/gcc.target/arm/pr111235.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..923b231afa888d326bcdc0ecabbcf8ba223d365a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr111235.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c11 -O" } */
> +/* { dg-require-effective-target arm_arch_v7a_ok } */
> +/* { dg-add-options arm_arch_v7a } */
> +
> +int t0 (int *p, int x)
> +{
> +  if (x > 100)
> +    x = atomic_load_explicit (p, memory_order_relaxed);
> +  return x + 1;
> +}
> +
> +long long t1 (long long *p, int x)
> +{
> +  if (x > 100)
> +    x = atomic_load_explicit (p, memory_order_relaxed);
> +  return x + 1;
> +}
> +
> +void t2 (int *p, int x)
> +{
> +  if (x > 100)
> +    atomic_store_explicit (p, x, memory_order_relaxed);
> +}
> +
> +void t3 (long long *p, long long x)
> +{
> +  if (x > 100)
> +    atomic_store_explicit (p, x, memory_order_relaxed);
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 1 } } */
> +/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
> +

There are new tests added for v7-a , what happens with the output for
v8-a and the changes for ldacq and other such instructions ?


regards

Ramana
Wilco Dijkstra Sept. 27, 2023, 7:40 p.m. UTC | #2
Hi Ramana,

> Hope this helps.

Yes definitely!

>> Passes regress/bootstrap, OK for commit?
>
> Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ?
> Please be specific.

I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
--build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
default armhf settings are incorrect. I shouldn't need the --with-float=hard since
that is obviously implied by armhf, and they should also imply armv7-a with vfpv3
according to documentation. It seems to get confused and skip some tests. I tried
using --with-fpu=auto, but that doesn't work at all, so in the end I forced it like:
--with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more tests.

> Since these patterns touch armv8m.baseline can you find all the
> testcases in the testsuite and ensure no change in code for
> armv8m.baseline as that's unpredicated already and this patch brings
> this in line with the same ? Does the testsuite already cover these
> arch variants and are you satisfied that the tests in the testsuite
> can catch / don't make any additional code changes to the other
> architectures affected by this ?

There are various v8-m(.base/.main) tests and they all pass. The generated
code is generally unchanged if there was no conditional execution. I made
the new UNSPEC_LDR/STR patterns support offsets so there is no difference
in generated code for relaxed loads/stores (since they used to use a plain
load/store which has an immediate offset).

>>         * onfig/arm/sync.md (arm_atomic_load<mode>): Add new pattern.
>
> Nit: s/onfig/config

Fixed.

>>         (atomic_load<mode>): Always expand atomic loads explicitly.
>>         (atomic_store<mode>): Always expand atomic stores explicitly.
>
> Nit: Change message to :
> Switch patterns to define_expand.

Fixed.

> Largely looks ok though I cannot work out tonight if we need more v8-a
> or v8m-baseline specific tests for scan-assembler patterns.
>
> Clearly our testsuite doesn't catch it , so perhaps the OP could help
> validate this patch with their formal models to see if this fixes
> these set of issues and creates no new regressions ? Is that feasible
> to do ?

Disabling conditional execution avoids the issue. It's trivial to verify that
atomics can no longer be conditionally executed (no "%?"). When this is
committed, we can run the random testing again to confirm the issue
is no longer present.

> -(define_insn "atomic_load<mode>"
> -  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
> +(define_insn "arm_atomic_load<mode>"
> +  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
>      (unspec_volatile:QHSI
> -      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
> -       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]      ;; model
> +      [(match_operand:QHSI 1 "memory_operand" "m,m")]
>
> Remind me again why is it safe to go from the Q constraint to the m
> constraint here and everywhere else you've done this ?

That's because the relaxed loads/stores use LDR/STR wrapped in an
UNSPEC. To avoid regressions we have to use 'm' so that an immediate
offset can be merged into the memory access.

>> -  VUNSPEC_LDA          ; Represent a store-register-acquire.
>> +  VUNSPEC_LDR          ; Represent a load-register-relaxed.
>> +  VUNSPEC_LDA          ; Represent a load-register-acquire.
>
> Nit: LDA before LDR ? Though I suspect this list can be alphabetically
> ordered at another point of time.

Swapped.

> There are new tests added for v7-a , what happens with the output for
> v8-a and the changes for ldacq and other such instructions ?

v7-a and v8-a generate the same instructions for relaxed load/store.
The acquire/release versions are identical except they are no longer
predicated. Basically the new patterns are not only significantly simpler,
they are now the same between the many ARM/Thumb-2/v7-a/v8-m/v8-a
combinations, so test coverage is much higher now. This is how these
patterns should have been designed all along.

v2 follows below.

Cheers,
Wilco


[PATCH v2] ARM: Block predication on atomics [PR111235]

The v7 memory ordering model allows reordering of conditional atomic
instructions.  To avoid this, make all atomic patterns unconditional.
Expand atomic loads and stores for all architectures so the memory access
can be wrapped into an UNSPEC.

gcc/ChangeLog/
        PR target/111235
        * config/arm/constraints.md: Remove Pf constraint.
        * config/arm/sync.md (arm_atomic_load<mode>): Add new pattern.
        (arm_atomic_load_acquire<mode>): Likewise.
        (arm_atomic_store<mode>): Likewise.
        (arm_atomic_store_release<mode>): Likewise.
        (atomic_load<mode>): Switch patterns to define_expand.
        (atomic_store<mode>): Likewise.
        (arm_atomic_loaddi2_ldrd): Remove predication.
        (arm_load_exclusive<mode>): Likewise.
        (arm_load_acquire_exclusive<mode>): Likewise.
        (arm_load_exclusivesi): Likewise.
        (arm_load_acquire_exclusivesi: Likewise.
        (arm_load_exclusivedi): Likewise.
        (arm_load_acquire_exclusivedi): Likewise.
        (arm_store_exclusive<mode>): Likewise.
        (arm_store_release_exclusivedi): Likewise.
        (arm_store_release_exclusive<mode>): Likewise.
        * gcc/config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR.

gcc/testsuite/ChangeLog/
        PR target/111235
        * gcc.dg/rtl/arm/stl-cond.c: Remove test.
        * gcc.target/arm/atomic_loaddi_7.c: Fix dmb count.
        * gcc.target/arm/atomic_loaddi_8.c: Likewise.
        * gcc.target/arm/pr111235.c: Add new test.

---

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 05a4ebbdd67601d7b92aa44a619d17634cc69f17..d7c4a1b0cd785f276862048005e6cfa57cdcb20d 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
 ;;		     Rg, Ri
-;; in all states: Pf, Pg
+;; in all states: Pg
 
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
@@ -239,13 +239,6 @@ (define_constraint "Pe"
   (and (match_code "const_int")
        (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
 
-(define_constraint "Pf"
-  "Memory models except relaxed, consume or release ones."
-  (and (match_code "const_int")
-       (match_test "!is_mm_relaxed (memmodel_from_int (ival))
-		    && !is_mm_consume (memmodel_from_int (ival))
-		    && !is_mm_release (memmodel_from_int (ival))")))
-
 (define_constraint "Pg"
   "@internal In Thumb-2 state a constant in range 1 to 32"
   (and (match_code "const_int")
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 7626bf3c443285dc63b4c4367b11a879a99c93c6..2210810f67f37ce043b8fdc73b4f21b54c5b1912 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -62,68 +62,110 @@ (define_insn "*memory_barrier"
    (set_attr "conds" "unconditional")
    (set_attr "predicable" "no")])
 
-(define_insn "atomic_load<mode>"
-  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
+(define_insn "arm_atomic_load<mode>"
+  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
-       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
+      [(match_operand:QHSI 1 "memory_operand" "m,m")]
+      VUNSPEC_LDR))]
+  ""
+  "ldr<sync_sfx>\t%0, %1"
+  [(set_attr "arch" "32,any")])
+
+(define_insn "arm_atomic_load_acquire<mode>"
+  [(set (match_operand:QHSI 0 "register_operand" "=r")
+    (unspec_volatile:QHSI
+      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q")]
       VUNSPEC_LDA))]
   "TARGET_HAVE_LDACQ"
-  {
-    if (aarch_mm_needs_acquire (operands[2]))
-      {
-	if (TARGET_THUMB1)
-	  return "lda<sync_sfx>\t%0, %1";
-	else
-	  return "lda<sync_sfx>%?\t%0, %1";
-      }
-    else
-      {
-	if (TARGET_THUMB1)
-	  return "ldr<sync_sfx>\t%0, %1";
-	else
-	  return "ldr<sync_sfx>%?\t%0, %1";
-      }
-  }
-  [(set_attr "arch" "32,v8mb,any")
-   (set_attr "predicable" "yes")])
+  "lda<sync_sfx>\t%0, %C1"
+)
 
-(define_insn "atomic_store<mode>"
-  [(set (match_operand:QHSI 0 "memory_operand" "=Q,Q,Q")
+(define_insn "arm_atomic_store<mode>"
+  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
+    (unspec_volatile:QHSI
+      [(match_operand:QHSI 1 "register_operand" "r,l")]
+      VUNSPEC_STR))]
+  ""
+  "str<sync_sfx>\t%1, %0";
+  [(set_attr "arch" "32,any")])
+
+(define_insn "arm_atomic_store_release<mode>"
+  [(set (match_operand:QHSI 0 "arm_sync_memory_operand" "=Q")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "general_operand" "r,r,l")
-       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
+      [(match_operand:QHSI 1 "register_operand" "r")]
       VUNSPEC_STL))]
   "TARGET_HAVE_LDACQ"
-  {
-    if (aarch_mm_needs_release (operands[2]))
-      {
-	if (TARGET_THUMB1)
-	  return "stl<sync_sfx>\t%1, %0";
-	else
-	  return "stl<sync_sfx>%?\t%1, %0";
-      }
-    else
-      {
-	if (TARGET_THUMB1)
-	  return "str<sync_sfx>\t%1, %0";
-	else
-	  return "str<sync_sfx>%?\t%1, %0";
-      }
-  }
-  [(set_attr "arch" "32,v8mb,any")
-   (set_attr "predicable" "yes")])
+  "stl<sync_sfx>\t%1, %C0")
+
+
+(define_expand "atomic_load<mode>"
+  [(match_operand:QHSI 0 "register_operand")		;; val out
+   (match_operand:QHSI 1 "arm_sync_memory_operand")	;; memory
+   (match_operand:SI 2 "const_int_operand")]		;; model
+  ""
+{
+  memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
+    {
+      emit_insn (gen_arm_atomic_load_acquire<mode> (operands[0], operands[1]));
+      DONE;
+    }
+
+  /* The seq_cst model needs a barrier before the load to block reordering with
+     earlier accesses.  */
+  if (is_mm_seq_cst (model))
+    expand_mem_thread_fence (model);
+
+  emit_insn (gen_arm_atomic_load<mode> (operands[0], operands[1]));
+
+  /* All non-relaxed models need a barrier after the load when load-acquire
+     instructions are not available.  */
+  if (!is_mm_relaxed (model))
+    expand_mem_thread_fence (model);
+
+  DONE;
+})
+
+(define_expand "atomic_store<mode>"
+  [(match_operand:QHSI 0 "arm_sync_memory_operand")	;; memory
+   (match_operand:QHSI 1 "register_operand")		;; store value
+   (match_operand:SI 2 "const_int_operand")]		;; model
+  ""
+{
+  memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
+    {
+      emit_insn (gen_arm_atomic_store_release<mode> (operands[0], operands[1]));
+      DONE;
+    }
+
+  /* All non-relaxed models need a barrier after the load when load-acquire
+     instructions are not available.  */
+  if (!is_mm_relaxed (model))
+    expand_mem_thread_fence (model);
+
+  emit_insn (gen_arm_atomic_store<mode> (operands[0], operands[1]));
+
+  /* The seq_cst model needs a barrier after the store to block reordering with
+     later accesses.  */
+  if (is_mm_seq_cst (model))
+    expand_mem_thread_fence (model);
+
+  DONE;
+})
 
 ;; An LDRD instruction usable by the atomic_loaddi expander on LPAE targets
 
 (define_insn "arm_atomic_loaddi2_ldrd"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec_volatile:DI
-	  [(match_operand:DI 1 "arm_sync_memory_operand" "Q")]
+	  [(match_operand:DI 1 "memory_operand" "m")]
 	    VUNSPEC_LDRD_ATOMIC))]
   "ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_LPAE"
-  "ldrd%?\t%0, %H0, %C1"
-  [(set_attr "predicable" "yes")])
+  "ldrd\t%0, %H0, %1"
+)
 
 ;; There are three ways to expand this depending on the architecture
 ;; features available.  As for the barriers, a load needs a barrier
@@ -152,6 +194,11 @@ (define_expand "atomic_loaddi"
       DONE;
     }
 
+  /* The seq_cst model needs a barrier before the load to block reordering with
+     earlier accesses.  */
+  if (is_mm_seq_cst (model))
+    expand_mem_thread_fence (model);
+
   /* On LPAE targets LDRD and STRD accesses to 64-bit aligned
      locations are 64-bit single-copy atomic.  We still need barriers in the
      appropriate places to implement the ordering constraints.  */
@@ -160,7 +207,6 @@ (define_expand "atomic_loaddi"
   else
     emit_insn (gen_arm_load_exclusivedi (operands[0], operands[1]));
 
-
   /* All non-relaxed models need a barrier after the load when load-acquire
      instructions are not available.  */
   if (!is_mm_relaxed (model))
@@ -446,54 +492,42 @@ (define_insn_and_split "atomic_nand_fetch<mode>"
   [(set_attr "arch" "32,v8mb")])
 
 (define_insn "arm_load_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
 	    VUNSPEC_LL)))]
   "TARGET_HAVE_LDREXBH"
-  "@
-   ldrex<sync_sfx>%?\t%0, %C1
-   ldrex<sync_sfx>\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldrex<sync_sfx>\t%0, %C1"
+)
 
 (define_insn "arm_load_acquire_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
 	    VUNSPEC_LAX)))]
   "TARGET_HAVE_LDACQ"
-  "@
-   ldaex<sync_sfx>%?\\t%0, %C1
-   ldaex<sync_sfx>\\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldaex<sync_sfx>\\t%0, %C1"
+)
 
 (define_insn "arm_load_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LL))]
   "TARGET_HAVE_LDREX"
-  "@
-   ldrex%?\t%0, %C1
-   ldrex\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldrex\t%0, %C1"
+)
 
 (define_insn "arm_load_acquire_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LAX))]
   "TARGET_HAVE_LDACQ"
-  "@
-   ldaex%?\t%0, %C1
-   ldaex\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldaex\t%0, %C1"
+)
 
 (define_insn "arm_load_exclusivedi"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
@@ -501,8 +535,8 @@ (define_insn "arm_load_exclusivedi"
 	  [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LL))]
   "TARGET_HAVE_LDREXD"
-  "ldrexd%?\t%0, %H0, %C1"
-  [(set_attr "predicable" "yes")])
+  "ldrexd\t%0, %H0, %C1"
+)
 
 (define_insn "arm_load_acquire_exclusivedi"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
@@ -510,8 +544,8 @@ (define_insn "arm_load_acquire_exclusivedi"
 	  [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LAX))]
   "TARGET_HAVE_LDACQEXD && ARM_DOUBLEWORD_ALIGN"
-  "ldaexd%?\t%0, %H0, %C1"
-  [(set_attr "predicable" "yes")])
+  "ldaexd\t%0, %H0, %C1"
+)
 
 (define_insn "arm_store_exclusive<mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
@@ -530,14 +564,11 @@ (define_insn "arm_store_exclusive<mode>"
 	   Note that the 1st register always gets the
 	   lowest word in memory.  */
 	gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
-	return "strexd%?\t%0, %2, %H2, %C1";
+	return "strexd\t%0, %2, %H2, %C1";
       }
-    if (TARGET_THUMB1)
-      return "strex<sync_sfx>\t%0, %2, %C1";
-    else
-      return "strex<sync_sfx>%?\t%0, %2, %C1";
+    return "strex<sync_sfx>\t%0, %2, %C1";
   }
-  [(set_attr "predicable" "yes")])
+)
 
 (define_insn "arm_store_release_exclusivedi"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
@@ -550,20 +581,16 @@ (define_insn "arm_store_release_exclusivedi"
   {
     /* See comment in arm_store_exclusive<mode> above.  */
     gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
-    return "stlexd%?\t%0, %2, %H2, %C1";
+    return "stlexd\t%0, %2, %H2, %C1";
   }
-  [(set_attr "predicable" "yes")])
+)
 
 (define_insn "arm_store_release_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
+  [(set (match_operand:SI 0 "s_register_operand" "=&r")
 	(unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
-   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua,Ua")
+   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua")
 	(unspec_volatile:QHSI
-	  [(match_operand:QHSI 2 "s_register_operand" "r,r")]
+	  [(match_operand:QHSI 2 "s_register_operand" "r")]
 	  VUNSPEC_SLX))]
   "TARGET_HAVE_LDACQ"
-  "@
-   stlex<sync_sfx>%?\t%0, %2, %C1
-   stlex<sync_sfx>\t%0, %2, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "stlex<sync_sfx>\t%0, %2, %C1")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 6a5b1f8f623b00dc014c9f829c591477f8faa9a1..4713ec840abae48ca70f418dbc0d4028ad4ad527 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -221,8 +221,10 @@ (define_c_enum "unspecv" [
   VUNSPEC_SC		; Represent a store-register-exclusive.
   VUNSPEC_LAX		; Represent a load-register-acquire-exclusive.
   VUNSPEC_SLX		; Represent a store-register-release-exclusive.
-  VUNSPEC_LDA		; Represent a store-register-acquire.
+  VUNSPEC_LDA		; Represent a load-register-acquire.
+  VUNSPEC_LDR		; Represent a load-register-relaxed.
   VUNSPEC_STL		; Represent a store-register-release.
+  VUNSPEC_STR		; Represent a store-register-relaxed.
   VUNSPEC_GET_FPSCR	; Represent fetch of FPSCR content.
   VUNSPEC_SET_FPSCR	; Represent assign of FPSCR content.
   VUNSPEC_SET_FPSCR_NZCVQC	; Represent assign of FPSCR_nzcvqc content.
diff --git a/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c b/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c
deleted file mode 100644
index e47ca6bf36dac500b90d80e52e150a19c95c7219..0000000000000000000000000000000000000000
--- a/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c
+++ /dev/null
@@ -1,61 +0,0 @@
-/* { dg-do compile { target arm*-*-* } } */
-/* { dg-require-effective-target arm_arm_ok } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
-/* { dg-options "-O2 -marm" } */
-/* { dg-add-options arm_arch_v8a } */
-
-/* We want to test that the STL instruction gets the conditional
-   suffix when under a COND_EXEC.  However, COND_EXEC is very hard to
-   generate from C code because the atomic_store expansion adds a compiler
-   barrier before the insn, preventing if-conversion.  So test the output
-   here with a hand-crafted COND_EXEC wrapped around an STL.  */
-
-void __RTL (startwith ("final")) foo (int *a, int b)
-{
-(function "foo"
-  (param "a"
-    (DECL_RTL (reg/v:SI r0))
-    (DECL_RTL_INCOMING (reg:SI r0))
-  )
-  (param "b"
-    (DECL_RTL (reg/v:SI r1))
-    (DECL_RTL_INCOMING (reg:SI r1))
-  )
-  (insn-chain
-    (block 2
-	(edge-from entry (flags "FALLTHRU"))
-	(cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
-
-  (insn:TI 7 (parallel [
-	(set (reg:CC cc)
-	     (compare:CC (reg:SI r1)
-			 (const_int 0)))
-	(set (reg/v:SI r1)
-	     (reg:SI r1 ))
-        ])  ;; {*movsi_compare0}
-     (nil))
-
-  ;; A conditional atomic store-release: STLNE for Armv8-A.
-  (insn 10 (cond_exec (ne (reg:CC cc)
-	   (const_int 0))
-	(set (mem/v:SI (reg/v/f:SI r0) [-1  S4 A32])
-		(unspec_volatile:SI [
-		(reg/v:SI r1)
-		(const_int 3)
-		] VUNSPEC_STL))) ;; {*p atomic_storesi}
-	(expr_list:REG_DEAD (reg:CC cc)
-	(expr_list:REG_DEAD (reg/v:SI r1)
-	(expr_list:REG_DEAD (reg/v/f:SI r0)
-		(nil)))))
-      (edge-to exit (flags "FALLTHRU"))
-    ) ;; block 2
-  ) ;; insn-chain
-  (crtl
-    (return_rtx
-      (reg/i:SI r0)
-    ) ;; return_rtx
-  ) ;; crtl
-) ;; function
-}
-
-/* { dg-final { scan-assembler "stlne" } } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c b/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
index 6743663f1e6831d3fc441e80fcfe3d761987970b..79e36edbb8fa86712b8d283b9a81b794610f4813 100644
--- a/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
+++ b/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
@@ -6,4 +6,4 @@
 #include "atomic_loaddi_seq_cst.x"
 
 /* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 1 } } */
-/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c b/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
index f7bd3e5a2b515d91418f5f4a7a8db00336a8b594..7241d361313997b0133056c1a3a7c1b546708e43 100644
--- a/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
+++ b/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
@@ -6,4 +6,4 @@
 #include "atomic_loaddi_seq_cst.x"
 
 /* { dg-final { scan-assembler-times "ldrd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 1 } } */
-/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr111235.c b/gcc/testsuite/gcc.target/arm/pr111235.c
new file mode 100644
index 0000000000000000000000000000000000000000..b06a5bfb8e294edf04ec9b7b6d8fe5d1ac73b1e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr111235.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-add-options arm_arch_v7a } */
+
+#include <stdatomic.h>
+
+int t0 (int *p, int x)
+{
+  if (x > 100)
+    x = atomic_load_explicit (p, memory_order_relaxed);
+  return x + 1;
+}
+
+long long t1 (long long *p, int x)
+{
+  if (x > 100)
+    x = atomic_load_explicit (p, memory_order_relaxed);
+  return x + 1;
+}
+
+void t2 (int *p, int x)
+{
+  if (x > 100)
+    atomic_store_explicit (p, x, memory_order_relaxed);
+}
+
+void t3 (long long *p, int x)
+{
+  if (x > 100)
+    atomic_store_explicit (p, x, memory_order_relaxed);
+}
+
+/* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-not "ldrgt" } } */
+/* { dg-final { scan-assembler-not "ldrdgt" } } */
+/* { dg-final { scan-assembler-not "ldrexdgt" } } */
+/* { dg-final { scan-assembler-not "strgt" } } */
+/* { dg-final { scan-assembler-not "strdgt" } } */
Ramana Radhakrishnan Sept. 30, 2023, 8:36 p.m. UTC | #3
+ linaro-toolchain as I don't understand the CI issues on patchwork.

On Wed, Sep 27, 2023 at 8:40 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi Ramana,
>
> > Hope this helps.
>
> Yes definitely!
>
> >> Passes regress/bootstrap, OK for commit?
> >
> > Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ?
> > Please be specific.
>
> I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
> --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
> default armhf settings are incorrect. I shouldn't need the --with-float=hard since
> that is obviously implied by armhf, and they should also imply armv7-a with vfpv3
> according to documentation. It seems to get confused and skip some tests. I tried
> using --with-fpu=auto, but that doesn't work at all, so in the end I forced it like:
> --with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more tests.

Yeah that's a wart that I don't like.

armhf just implies the hard float ABI and came into being to help
distinguish from the Base PCS for some of the distros at the time
(2010s). However we didn't want to set a baseline arch at that time
given the imminent arrival of v8-a and thus the specification of
--with-arch , --with-fpu and --with-float became second nature to many
of us working on it at that time.

I can see how it can be confusing.

With the advent of fpu=auto , I thought this could be made better but
that is a matter for another patch and another discussion.

However until then do remember to baseline to --with-arch , --with-fpu
and --with-float . Unfortunate but needed. If there is a bug with
--with-fpu=auto , good to articulate it as I understand that to be the
state of the art.

Thank you for working through it.

>
> > Since these patterns touch armv8m.baseline can you find all the
> > testcases in the testsuite and ensure no change in code for
> > armv8m.baseline as that's unpredicated already and this patch brings
> > this in line with the same ? Does the testsuite already cover these
> > arch variants and are you satisfied that the tests in the testsuite
> > can catch / don't make any additional code changes to the other
> > architectures affected by this ?
>
> There are various v8-m(.base/.main) tests and they all pass. The generated
> code is generally unchanged if there was no conditional execution. I made
> the new UNSPEC_LDR/STR patterns support offsets so there is no difference
> in generated code for relaxed loads/stores (since they used to use a plain
> load/store which has an immediate offset).
>
> >>         * onfig/arm/sync.md (arm_atomic_load<mode>): Add new pattern.
> >
> > Nit: s/onfig/config
>
> Fixed.

Thanks

>
> >>         (atomic_load<mode>): Always expand atomic loads explicitly.
> >>         (atomic_store<mode>): Always expand atomic stores explicitly.
> >
> > Nit: Change message to :
> > Switch patterns to define_expand.
>
> Fixed.

Thanks.

>
> > Largely looks ok though I cannot work out tonight if we need more v8-a
> > or v8m-baseline specific tests for scan-assembler patterns.
> >
> > Clearly our testsuite doesn't catch it , so perhaps the OP could help
> > validate this patch with their formal models to see if this fixes
> > these set of issues and creates no new regressions ? Is that feasible
> > to do ?
>
> Disabling conditional execution avoids the issue. It's trivial to verify that
> atomics can no longer be conditionally executed (no "%?"). When this is
> committed, we can run the random testing again to confirm the issue
> is no longer present.

Ok, thanks for promising to do so - I trust you to get it done. Please
try out various combinations of -march v7ve, v7-a , v8-a with the tool
as each of them have slightly different rules. For instance v7ve
allows LDREXD and STREXD to be single copy atomic for 64 bit loads
whereas v7-a did not .

>
> > -(define_insn "atomic_load<mode>"
> > -  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
> > +(define_insn "arm_atomic_load<mode>"
> > +  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
> >      (unspec_volatile:QHSI
> > -      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
> > -       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]      ;; model
> > +      [(match_operand:QHSI 1 "memory_operand" "m,m")]
> >
> > Remind me again why is it safe to go from the Q constraint to the m
> > constraint here and everywhere else you've done this ?
>
> That's because the relaxed loads/stores use LDR/STR wrapped in an
> UNSPEC. To avoid regressions we have to use 'm' so that an immediate
> offset can be merged into the memory access.

This is because the instructions used here are ldr and str and the use
of the 'm' constraint is considered safe.


>
> >> -  VUNSPEC_LDA          ; Represent a store-register-acquire.
> >> +  VUNSPEC_LDR          ; Represent a load-register-relaxed.
> >> +  VUNSPEC_LDA          ; Represent a load-register-acquire.
> >
> > Nit: LDA before LDR ? Though I suspect this list can be alphabetically
> > ordered at another point of time.
>
> Swapped.

Thanks

>
> > There are new tests added for v7-a , what happens with the output for
> > v8-a and the changes for ldacq and other such instructions ?
>
> v7-a and v8-a generate the same instructions for relaxed load/store.
> The acquire/release versions are identical except they are no longer
> predicated. Basically the new patterns are not only significantly simpler,
> they are now the same between the many ARM/Thumb-2/v7-a/v8-m/v8-a
> combinations, so test coverage is much higher now. This is how these
> patterns should have been designed all along.


>
> v2 follows below.
>
> Cheers,
> Wilco
>
>
> [PATCH v2] ARM: Block predication on atomics [PR111235]
>
> The v7 memory ordering model allows reordering of conditional atomic
> instructions.  To avoid this, make all atomic patterns unconditional.
> Expand atomic loads and stores for all architectures so the memory access
> can be wrapped into an UNSPEC.
>
> gcc/ChangeLog/
>         PR target/111235
>         * config/arm/constraints.md: Remove Pf constraint.
>         * config/arm/sync.md (arm_atomic_load<mode>): Add new pattern.
>         (arm_atomic_load_acquire<mode>): Likewise.
>         (arm_atomic_store<mode>): Likewise.
>         (arm_atomic_store_release<mode>): Likewise.
>         (atomic_load<mode>): Switch patterns to define_expand.
>         (atomic_store<mode>): Likewise.
>         (arm_atomic_loaddi2_ldrd): Remove predication.
>         (arm_load_exclusive<mode>): Likewise.
>         (arm_load_acquire_exclusive<mode>): Likewise.
>         (arm_load_exclusivesi): Likewise.
>         (arm_load_acquire_exclusivesi: Likewise.
>         (arm_load_exclusivedi): Likewise.
>         (arm_load_acquire_exclusivedi): Likewise.
>         (arm_store_exclusive<mode>): Likewise.
>         (arm_store_release_exclusivedi): Likewise.
>         (arm_store_release_exclusive<mode>): Likewise.
>         * gcc/config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR.
>
> gcc/testsuite/ChangeLog/
>         PR target/111235
>         * gcc.dg/rtl/arm/stl-cond.c: Remove test.
>         * gcc.target/arm/atomic_loaddi_7.c: Fix dmb count.
>         * gcc.target/arm/atomic_loaddi_8.c: Likewise.
>         * gcc.target/arm/pr111235.c: Add new test.
>
> ---
>
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 05a4ebbdd67601d7b92aa44a619d17634cc69f17..d7c4a1b0cd785f276862048005e6cfa57cdcb20d 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -36,7 +36,7 @@
>  ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
>  ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
>  ;;                  Rg, Ri
> -;; in all states: Pf, Pg
> +;; in all states: Pg
>
>  ;; The following memory constraints have been used:
>  ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
> @@ -239,13 +239,6 @@ (define_constraint "Pe"
>    (and (match_code "const_int")
>         (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
>
> -(define_constraint "Pf"
> -  "Memory models except relaxed, consume or release ones."
> -  (and (match_code "const_int")
> -       (match_test "!is_mm_relaxed (memmodel_from_int (ival))
> -                   && !is_mm_consume (memmodel_from_int (ival))
> -                   && !is_mm_release (memmodel_from_int (ival))")))
> -
>  (define_constraint "Pg"
>    "@internal In Thumb-2 state a constant in range 1 to 32"
>    (and (match_code "const_int")
> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
> index 7626bf3c443285dc63b4c4367b11a879a99c93c6..2210810f67f37ce043b8fdc73b4f21b54c5b1912 100644
> --- a/gcc/config/arm/sync.md
> +++ b/gcc/config/arm/sync.md
> @@ -62,68 +62,110 @@ (define_insn "*memory_barrier"
>     (set_attr "conds" "unconditional")
>     (set_attr "predicable" "no")])
>
> -(define_insn "atomic_load<mode>"
> -  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
> +(define_insn "arm_atomic_load<mode>"
> +  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
>      (unspec_volatile:QHSI
> -      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
> -       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]      ;; model
> +      [(match_operand:QHSI 1 "memory_operand" "m,m")]
> +      VUNSPEC_LDR))]
> +  ""
> +  "ldr<sync_sfx>\t%0, %1"
> +  [(set_attr "arch" "32,any")])
> +
> +(define_insn "arm_atomic_load_acquire<mode>"
> +  [(set (match_operand:QHSI 0 "register_operand" "=r")
> +    (unspec_volatile:QHSI
> +      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q")]
>        VUNSPEC_LDA))]
>    "TARGET_HAVE_LDACQ"
> -  {
> -    if (aarch_mm_needs_acquire (operands[2]))
> -      {
> -       if (TARGET_THUMB1)
> -         return "lda<sync_sfx>\t%0, %1";
> -       else
> -         return "lda<sync_sfx>%?\t%0, %1";
> -      }
> -    else
> -      {
> -       if (TARGET_THUMB1)
> -         return "ldr<sync_sfx>\t%0, %1";
> -       else
> -         return "ldr<sync_sfx>%?\t%0, %1";
> -      }
> -  }
> -  [(set_attr "arch" "32,v8mb,any")
> -   (set_attr "predicable" "yes")])
> +  "lda<sync_sfx>\t%0, %C1"
> +)
>
> -(define_insn "atomic_store<mode>"
> -  [(set (match_operand:QHSI 0 "memory_operand" "=Q,Q,Q")
> +(define_insn "arm_atomic_store<mode>"
> +  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
> +    (unspec_volatile:QHSI
> +      [(match_operand:QHSI 1 "register_operand" "r,l")]
> +      VUNSPEC_STR))]
> +  ""
> +  "str<sync_sfx>\t%1, %0";
> +  [(set_attr "arch" "32,any")])
> +
> +(define_insn "arm_atomic_store_release<mode>"
> +  [(set (match_operand:QHSI 0 "arm_sync_memory_operand" "=Q")
>      (unspec_volatile:QHSI
> -      [(match_operand:QHSI 1 "general_operand" "r,r,l")
> -       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]      ;; model
> +      [(match_operand:QHSI 1 "register_operand" "r")]
>        VUNSPEC_STL))]
>    "TARGET_HAVE_LDACQ"
> -  {
> -    if (aarch_mm_needs_release (operands[2]))
> -      {
> -       if (TARGET_THUMB1)
> -         return "stl<sync_sfx>\t%1, %0";
> -       else
> -         return "stl<sync_sfx>%?\t%1, %0";
> -      }
> -    else
> -      {
> -       if (TARGET_THUMB1)
> -         return "str<sync_sfx>\t%1, %0";
> -       else
> -         return "str<sync_sfx>%?\t%1, %0";
> -      }
> -  }
> -  [(set_attr "arch" "32,v8mb,any")
> -   (set_attr "predicable" "yes")])
> +  "stl<sync_sfx>\t%1, %C0")
> +
> +
> +(define_expand "atomic_load<mode>"
> +  [(match_operand:QHSI 0 "register_operand")           ;; val out
> +   (match_operand:QHSI 1 "arm_sync_memory_operand")    ;; memory
> +   (match_operand:SI 2 "const_int_operand")]           ;; model
> +  ""
> +{
> +  memmodel model = memmodel_from_int (INTVAL (operands[2]));
> +
> +  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
> +    {
> +      emit_insn (gen_arm_atomic_load_acquire<mode> (operands[0], operands[1]));
> +      DONE;
> +    }
> +
> +  /* The seq_cst model needs a barrier before the load to block reordering with
> +     earlier accesses.  */
> +  if (is_mm_seq_cst (model))
> +    expand_mem_thread_fence (model);
> +
> +  emit_insn (gen_arm_atomic_load<mode> (operands[0], operands[1]));
> +
> +  /* All non-relaxed models need a barrier after the load when load-acquire
> +     instructions are not available.  */
> +  if (!is_mm_relaxed (model))
> +    expand_mem_thread_fence (model);
> +
> +  DONE;
> +})
> +
> +(define_expand "atomic_store<mode>"
> +  [(match_operand:QHSI 0 "arm_sync_memory_operand")    ;; memory
> +   (match_operand:QHSI 1 "register_operand")           ;; store value
> +   (match_operand:SI 2 "const_int_operand")]           ;; model
> +  ""
> +{
> +  memmodel model = memmodel_from_int (INTVAL (operands[2]));
> +
> +  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
> +    {
> +      emit_insn (gen_arm_atomic_store_release<mode> (operands[0], operands[1]));
> +      DONE;
> +    }
> +
> +  /* All non-relaxed models need a barrier after the load when load-acquire
> +     instructions are not available.  */
> +  if (!is_mm_relaxed (model))
> +    expand_mem_thread_fence (model);
> +
> +  emit_insn (gen_arm_atomic_store<mode> (operands[0], operands[1]));
> +
> +  /* The seq_cst model needs a barrier after the store to block reordering with
> +     later accesses.  */
> +  if (is_mm_seq_cst (model))
> +    expand_mem_thread_fence (model);
> +
> +  DONE;
> +})
>
>  ;; An LDRD instruction usable by the atomic_loaddi expander on LPAE targets
>
>  (define_insn "arm_atomic_loaddi2_ldrd"
>    [(set (match_operand:DI 0 "register_operand" "=r")
>         (unspec_volatile:DI
> -         [(match_operand:DI 1 "arm_sync_memory_operand" "Q")]
> +         [(match_operand:DI 1 "memory_operand" "m")]
>             VUNSPEC_LDRD_ATOMIC))]
>    "ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_LPAE"
> -  "ldrd%?\t%0, %H0, %C1"
> -  [(set_attr "predicable" "yes")])
> +  "ldrd\t%0, %H0, %1"
> +)
>
>  ;; There are three ways to expand this depending on the architecture
>  ;; features available.  As for the barriers, a load needs a barrier
> @@ -152,6 +194,11 @@ (define_expand "atomic_loaddi"
>        DONE;
>      }
>
> +  /* The seq_cst model needs a barrier before the load to block reordering with
> +     earlier accesses.  */
> +  if (is_mm_seq_cst (model))
> +    expand_mem_thread_fence (model);
> +
>    /* On LPAE targets LDRD and STRD accesses to 64-bit aligned
>       locations are 64-bit single-copy atomic.  We still need barriers in the
>       appropriate places to implement the ordering constraints.  */
> @@ -160,7 +207,6 @@ (define_expand "atomic_loaddi"
>    else
>      emit_insn (gen_arm_load_exclusivedi (operands[0], operands[1]));
>
> -
>    /* All non-relaxed models need a barrier after the load when load-acquire
>       instructions are not available.  */
>    if (!is_mm_relaxed (model))
> @@ -446,54 +492,42 @@ (define_insn_and_split "atomic_nand_fetch<mode>"
>    [(set_attr "arch" "32,v8mb")])
>
>  (define_insn "arm_load_exclusive<mode>"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>          (zero_extend:SI
>           (unspec_volatile:NARROW
> -           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
> +           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
>             VUNSPEC_LL)))]
>    "TARGET_HAVE_LDREXBH"
> -  "@
> -   ldrex<sync_sfx>%?\t%0, %C1
> -   ldrex<sync_sfx>\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldrex<sync_sfx>\t%0, %C1"
> +)
>
>  (define_insn "arm_load_acquire_exclusive<mode>"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>          (zero_extend:SI
>           (unspec_volatile:NARROW
> -           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
> +           [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
>             VUNSPEC_LAX)))]
>    "TARGET_HAVE_LDACQ"
> -  "@
> -   ldaex<sync_sfx>%?\\t%0, %C1
> -   ldaex<sync_sfx>\\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldaex<sync_sfx>\\t%0, %C1"
> +)
>
>  (define_insn "arm_load_exclusivesi"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>         (unspec_volatile:SI
> -         [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
> +         [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LL))]
>    "TARGET_HAVE_LDREX"
> -  "@
> -   ldrex%?\t%0, %C1
> -   ldrex\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldrex\t%0, %C1"
> +)
>
>  (define_insn "arm_load_acquire_exclusivesi"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>         (unspec_volatile:SI
> -         [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
> +         [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LAX))]
>    "TARGET_HAVE_LDACQ"
> -  "@
> -   ldaex%?\t%0, %C1
> -   ldaex\t%0, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "ldaex\t%0, %C1"
> +)
>
>  (define_insn "arm_load_exclusivedi"
>    [(set (match_operand:DI 0 "s_register_operand" "=r")
> @@ -501,8 +535,8 @@ (define_insn "arm_load_exclusivedi"
>           [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LL))]
>    "TARGET_HAVE_LDREXD"
> -  "ldrexd%?\t%0, %H0, %C1"
> -  [(set_attr "predicable" "yes")])
> +  "ldrexd\t%0, %H0, %C1"
> +)
>
>  (define_insn "arm_load_acquire_exclusivedi"
>    [(set (match_operand:DI 0 "s_register_operand" "=r")
> @@ -510,8 +544,8 @@ (define_insn "arm_load_acquire_exclusivedi"
>           [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
>           VUNSPEC_LAX))]
>    "TARGET_HAVE_LDACQEXD && ARM_DOUBLEWORD_ALIGN"
> -  "ldaexd%?\t%0, %H0, %C1"
> -  [(set_attr "predicable" "yes")])
> +  "ldaexd\t%0, %H0, %C1"
> +)
>
>  (define_insn "arm_store_exclusive<mode>"
>    [(set (match_operand:SI 0 "s_register_operand" "=&r")
> @@ -530,14 +564,11 @@ (define_insn "arm_store_exclusive<mode>"
>            Note that the 1st register always gets the
>            lowest word in memory.  */
>         gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
> -       return "strexd%?\t%0, %2, %H2, %C1";
> +       return "strexd\t%0, %2, %H2, %C1";
>        }
> -    if (TARGET_THUMB1)
> -      return "strex<sync_sfx>\t%0, %2, %C1";
> -    else
> -      return "strex<sync_sfx>%?\t%0, %2, %C1";
> +    return "strex<sync_sfx>\t%0, %2, %C1";
>    }
> -  [(set_attr "predicable" "yes")])
> +)
>
>  (define_insn "arm_store_release_exclusivedi"
>    [(set (match_operand:SI 0 "s_register_operand" "=&r")
> @@ -550,20 +581,16 @@ (define_insn "arm_store_release_exclusivedi"
>    {
>      /* See comment in arm_store_exclusive<mode> above.  */
>      gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
> -    return "stlexd%?\t%0, %2, %H2, %C1";
> +    return "stlexd\t%0, %2, %H2, %C1";
>    }
> -  [(set_attr "predicable" "yes")])
> +)
>
>  (define_insn "arm_store_release_exclusive<mode>"
> -  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
> +  [(set (match_operand:SI 0 "s_register_operand" "=&r")
>         (unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
> -   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua,Ua")
> +   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua")
>         (unspec_volatile:QHSI
> -         [(match_operand:QHSI 2 "s_register_operand" "r,r")]
> +         [(match_operand:QHSI 2 "s_register_operand" "r")]
>           VUNSPEC_SLX))]
>    "TARGET_HAVE_LDACQ"
> -  "@
> -   stlex<sync_sfx>%?\t%0, %2, %C1
> -   stlex<sync_sfx>\t%0, %2, %C1"
> -  [(set_attr "arch" "32,v8mb")
> -   (set_attr "predicable" "yes")])
> +  "stlex<sync_sfx>\t%0, %2, %C1")
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index 6a5b1f8f623b00dc014c9f829c591477f8faa9a1..4713ec840abae48ca70f418dbc0d4028ad4ad527 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -221,8 +221,10 @@ (define_c_enum "unspecv" [
>    VUNSPEC_SC           ; Represent a store-register-exclusive.
>    VUNSPEC_LAX          ; Represent a load-register-acquire-exclusive.
>    VUNSPEC_SLX          ; Represent a store-register-release-exclusive.
> -  VUNSPEC_LDA          ; Represent a store-register-acquire.
> +  VUNSPEC_LDA          ; Represent a load-register-acquire.
> +  VUNSPEC_LDR          ; Represent a load-register-relaxed.
>    VUNSPEC_STL          ; Represent a store-register-release.
> +  VUNSPEC_STR          ; Represent a store-register-relaxed.
>    VUNSPEC_GET_FPSCR    ; Represent fetch of FPSCR content.
>    VUNSPEC_SET_FPSCR    ; Represent assign of FPSCR content.
>    VUNSPEC_SET_FPSCR_NZCVQC     ; Represent assign of FPSCR_nzcvqc content.
> diff --git a/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c b/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c
> deleted file mode 100644
> index e47ca6bf36dac500b90d80e52e150a19c95c7219..0000000000000000000000000000000000000000
> --- a/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c
> +++ /dev/null
> @@ -1,61 +0,0 @@
> -/* { dg-do compile { target arm*-*-* } } */
> -/* { dg-require-effective-target arm_arm_ok } */
> -/* { dg-require-effective-target arm_arch_v8a_ok } */
> -/* { dg-options "-O2 -marm" } */
> -/* { dg-add-options arm_arch_v8a } */
> -
> -/* We want to test that the STL instruction gets the conditional
> -   suffix when under a COND_EXEC.  However, COND_EXEC is very hard to
> -   generate from C code because the atomic_store expansion adds a compiler
> -   barrier before the insn, preventing if-conversion.  So test the output
> -   here with a hand-crafted COND_EXEC wrapped around an STL.  */
> -
> -void __RTL (startwith ("final")) foo (int *a, int b)
> -{
> -(function "foo"
> -  (param "a"
> -    (DECL_RTL (reg/v:SI r0))
> -    (DECL_RTL_INCOMING (reg:SI r0))
> -  )
> -  (param "b"
> -    (DECL_RTL (reg/v:SI r1))
> -    (DECL_RTL_INCOMING (reg:SI r1))
> -  )
> -  (insn-chain
> -    (block 2
> -       (edge-from entry (flags "FALLTHRU"))
> -       (cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
> -
> -  (insn:TI 7 (parallel [
> -       (set (reg:CC cc)
> -            (compare:CC (reg:SI r1)
> -                        (const_int 0)))
> -       (set (reg/v:SI r1)
> -            (reg:SI r1 ))
> -        ])  ;; {*movsi_compare0}
> -     (nil))
> -
> -  ;; A conditional atomic store-release: STLNE for Armv8-A.
> -  (insn 10 (cond_exec (ne (reg:CC cc)
> -          (const_int 0))
> -       (set (mem/v:SI (reg/v/f:SI r0) [-1  S4 A32])
> -               (unspec_volatile:SI [
> -               (reg/v:SI r1)
> -               (const_int 3)
> -               ] VUNSPEC_STL))) ;; {*p atomic_storesi}
> -       (expr_list:REG_DEAD (reg:CC cc)
> -       (expr_list:REG_DEAD (reg/v:SI r1)
> -       (expr_list:REG_DEAD (reg/v/f:SI r0)
> -               (nil)))))
> -      (edge-to exit (flags "FALLTHRU"))
> -    ) ;; block 2
> -  ) ;; insn-chain
> -  (crtl
> -    (return_rtx
> -      (reg/i:SI r0)
> -    ) ;; return_rtx
> -  ) ;; crtl
> -) ;; function
> -}
> -
> -/* { dg-final { scan-assembler "stlne" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c b/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
> index 6743663f1e6831d3fc441e80fcfe3d761987970b..79e36edbb8fa86712b8d283b9a81b794610f4813 100644
> --- a/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
> +++ b/gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c
> @@ -6,4 +6,4 @@
>  #include "atomic_loaddi_seq_cst.x"
>
>  /* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 1 } } */
> -/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
> +/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c b/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
> index f7bd3e5a2b515d91418f5f4a7a8db00336a8b594..7241d361313997b0133056c1a3a7c1b546708e43 100644
> --- a/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
> +++ b/gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c
> @@ -6,4 +6,4 @@
>  #include "atomic_loaddi_seq_cst.x"
>
>  /* { dg-final { scan-assembler-times "ldrd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 1 } } */
> -/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
> +/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/arm/pr111235.c b/gcc/testsuite/gcc.target/arm/pr111235.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b06a5bfb8e294edf04ec9b7b6d8fe5d1ac73b1e7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr111235.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target arm_arch_v7a_ok } */
> +/* { dg-add-options arm_arch_v7a } */
> +
> +#include <stdatomic.h>
> +
> +int t0 (int *p, int x)
> +{
> +  if (x > 100)
> +    x = atomic_load_explicit (p, memory_order_relaxed);
> +  return x + 1;
> +}
> +
> +long long t1 (long long *p, int x)
> +{
> +  if (x > 100)
> +    x = atomic_load_explicit (p, memory_order_relaxed);
> +  return x + 1;
> +}
> +
> +void t2 (int *p, int x)
> +{
> +  if (x > 100)
> +    atomic_store_explicit (p, x, memory_order_relaxed);
> +}
> +
> +void t3 (long long *p, int x)
> +{
> +  if (x > 100)
> +    atomic_store_explicit (p, x, memory_order_relaxed);
> +}
> +
> +/* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 2 } } */
> +/* { dg-final { scan-assembler-not "ldrgt" } } */
> +/* { dg-final { scan-assembler-not "ldrdgt" } } */
> +/* { dg-final { scan-assembler-not "ldrexdgt" } } */
> +/* { dg-final { scan-assembler-not "strgt" } } */
> +/* { dg-final { scan-assembler-not "strdgt" } } */


Ok if no regressions but as you might get nagged by the post commit CI ...

While it is not policy yet to look at these bots but given the
enthusiasm at Cauldron for patchwork and pre-commit CI and because all
my armhf boxes are boxed up, I decided to do something a bit novel !

I tried reviewing this via patchwork

https://patchwork.sourceware.org/project/gcc/patch/PAWPR08MB8982A6AA40749B74CAD14C5783EEA@PAWPR08MB8982.eurprd08.prod.outlook.com/

and notice that

https://ci.linaro.org/job/tcwg_gcc_build--master-arm-precommit/2393/artifact/artifacts/artifacts.precommit/notify/mail-body.txt
says nothing could be built.

Possibly worth double checking the status for it being a false
negative as to why the build failed.

It was green on patchwork but remembering that Green is not Green for
CI in patchwork I clicked on the afore mentioned ci.linaro.org link
and see that it's actually broken.

regards

Ramana
Wilco Dijkstra Oct. 2, 2023, 5:12 p.m. UTC | #4
Hi Ramana,

>> I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
>> --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
>> default armhf settings are incorrect. I shouldn't need the --with-float=hard since
>> that is obviously implied by armhf, and they should also imply armv7-a with vfpv3
>> according to documentation. It seems to get confused and skip some tests. I tried
>> using --with-fpu=auto, but that doesn't work at all, so in the end I forced it like:
>> --with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more tests.
> 
> Yeah that's a wart that I don't like.
> 
> armhf just implies the hard float ABI and came into being to help
> distinguish from the Base PCS for some of the distros at the time
> (2010s). However we didn't want to set a baseline arch at that time
> given the imminent arrival of v8-a and thus the specification of
> --with-arch , --with-fpu and --with-float became second nature to many
> of us working on it at that time.

Looking at it, the default is indeed incorrect, you get:
'-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp'

That's like 25 years out of date!

However all the armhf distros have Armv7-a as the baseline and use Thumb-2:
'-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp'

So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on
armhf. It seems that if you specify an architecture even with hard-float configured,
it turns off FP and then complains because hard-float implies you must have FP...

So in most configurations Iincluding the one used by distro compilers) we basically
skip lots of tests for no apparent reason...

> Ok, thanks for promising to do so - I trust you to get it done. Please
> try out various combinations of -march v7ve, v7-a , v8-a with the tool
> as each of them have slightly different rules. For instance v7ve
> allows LDREXD and STREXD to be single copy atomic for 64 bit loads
> whereas v7-a did not .

You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by
default since that is always atomic on v7-a.

> Ok if no regressions but as you might get nagged by the post commit CI ...

Thanks, I've committed it. Those links don't show anything concrete, however I do note
the CI didn't pick up v2.

Btw you're happy with backports if there are no issues reported for a few days?

Cheers,
Wilco
Maxim Kuvyrkov Oct. 3, 2023, 8:23 a.m. UTC | #5
> On Oct 1, 2023, at 00:36, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> 
> + linaro-toolchain as I don't understand the CI issues on patchwork.
> 
> 
...
> Ok if no regressions but as you might get nagged by the post commit CI ...

I don't see any pre-commit failures for this patch, but regardless of what results are for pre-commit CI, there's always a chance to identify problems in post-commit CI -- simply because we test wa-a-ay more configurations in post-commit CI than in pre-commit CI.

> 
> While it is not policy yet to look at these bots but given the
> enthusiasm at Cauldron for patchwork and pre-commit CI and because all
> my armhf boxes are boxed up, I decided to do something a bit novel !
> 
> I tried reviewing this via patchwork
> 
> https://patchwork.sourceware.org/project/gcc/patch/PAWPR08MB8982A6AA40749B74CAD14C5783EEA@PAWPR08MB8982.eurprd08.prod.outlook.com/
> 
> and notice that
> 
> https://ci.linaro.org/job/tcwg_gcc_build--master-arm-precommit/2393/artifact/artifacts/artifacts.precommit/notify/mail-body.txt
> says nothing could be built.

Um, no.  This says ...
===
Results changed to
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1

From
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1
===
... i.e., build succeeded both before and after patch.  We'll change the boilerplate intro for successful builds from ...
"Dear contributor, our automatic CI has detected problems related to your patch(es)."
... to ...
"Dear contributor, you are awesome, no CI failures related to your patch(es)".

One things that is strange -- testsuite builds were not triggered, we have only 2 reports from build tests, but are missing another 2 reports from testsuite tests.

> 
> Possibly worth double checking the status for it being a false
> negative as to why the build failed.

Pre-commit CI is happy with the patch, albeit testsuite checks didn't run for some reason.  Regardless, we'll quickly catch and report any fallout in the post-commit CI once the patch is merged.

> 
> It was green on patchwork but remembering that Green is not Green for
> CI in patchwork I clicked on the afore mentioned ci.linaro.org link
> and see that it's actually broken.

Unfortunately, I seem to have confused developers about green and red at my Cauldron presentation.  "Green/Red" in patchwork mean the usual PASS/FAIL.  It's only in post-commit CI in jenkins interface green and red mean something different.

--
Maxim Kuvyrkov
https://www.linaro.org
Richard Earnshaw Oct. 20, 2023, 4:47 p.m. UTC | #6
On 02/10/2023 18:12, Wilco Dijkstra wrote:
> Hi Ramana,
> 
>>> I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
>>> --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
>>> default armhf settings are incorrect. I shouldn't need the --with-float=hard since
>>> that is obviously implied by armhf, and they should also imply armv7-a with vfpv3
>>> according to documentation. It seems to get confused and skip some tests. I tried
>>> using --with-fpu=auto, but that doesn't work at all, so in the end I forced it like:
>>> --with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more tests.
>>
>> Yeah that's a wart that I don't like.
>>
>> armhf just implies the hard float ABI and came into being to help
>> distinguish from the Base PCS for some of the distros at the time
>> (2010s). However we didn't want to set a baseline arch at that time
>> given the imminent arrival of v8-a and thus the specification of
>> --with-arch , --with-fpu and --with-float became second nature to many
>> of us working on it at that time.
> 
> Looking at it, the default is indeed incorrect, you get:
> '-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp'
That's not incorrect.  It's the first version of the architecture that 
can support the hard-float ABI.

> 
> That's like 25 years out of date!

It's not a matter of being out of date (and it's only 22 years since 
arm1020e was announced ;) it's a matter of being as compatible as we can 
be with existing hardware out-of-the-box.  Distros are free, of course, 
to set a higher bar and do so.

> 
> However all the armhf distros have Armv7-a as the baseline and use Thumb-2:
> '-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp'

Wrong.  Rawhide uses Arm state (or it did last I checked).  As I 
mentioned above, distros are free to set a higher bar.

> 
> So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on
> armhf. It seems that if you specify an architecture even with hard-float configured,
> it turns off FP and then complains because hard-float implies you must have FP...

OK, I think I see the problem there, it's in the data for
         proc add_options_for_arm_arch_FUNC

in lib/target-supports.exp.  In order to work correctly with -mfpu=auto, 
the -march flags in the table need "+fp" adding in most cases (pretty 
much everything from armv5e onwards) - that's harmless whenever the 
float-abi is soft, but should do the right thing when softfp or hard are 
used.

> 
> So in most configurations Iincluding the one used by distro compilers) we basically
> skip lots of tests for no apparent reason...
> 
>> Ok, thanks for promising to do so - I trust you to get it done. Please
>> try out various combinations of -march v7ve, v7-a , v8-a with the tool
>> as each of them have slightly different rules. For instance v7ve
>> allows LDREXD and STREXD to be single copy atomic for 64 bit loads
>> whereas v7-a did not .
> 
> You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by
> default since that is always atomic on v7-a.
> 
>> Ok if no regressions but as you might get nagged by the post commit CI ...
> 
> Thanks, I've committed it. Those links don't show anything concrete, however I do note
> the CI didn't pick up v2.
> 
> Btw you're happy with backports if there are no issues reported for a few days?
> 
> Cheers,
> Wilco

R.
diff mbox series

Patch

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 05a4ebbdd67601d7b92aa44a619d17634cc69f17..d7c4a1b0cd785f276862048005e6cfa57cdcb20d 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@ 
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
 ;;		     Rg, Ri
-;; in all states: Pf, Pg
+;; in all states: Pg
 
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
@@ -239,13 +239,6 @@  (define_constraint "Pe"
   (and (match_code "const_int")
        (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
 
-(define_constraint "Pf"
-  "Memory models except relaxed, consume or release ones."
-  (and (match_code "const_int")
-       (match_test "!is_mm_relaxed (memmodel_from_int (ival))
-		    && !is_mm_consume (memmodel_from_int (ival))
-		    && !is_mm_release (memmodel_from_int (ival))")))
-
 (define_constraint "Pg"
   "@internal In Thumb-2 state a constant in range 1 to 32"
   (and (match_code "const_int")
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 7626bf3c443285dc63b4c4367b11a879a99c93c6..2210810f67f37ce043b8fdc73b4f21b54c5b1912 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -62,68 +62,110 @@  (define_insn "*memory_barrier"
    (set_attr "conds" "unconditional")
    (set_attr "predicable" "no")])
 
-(define_insn "atomic_load<mode>"
-  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
+(define_insn "arm_atomic_load<mode>"
+  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
-       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
+      [(match_operand:QHSI 1 "memory_operand" "m,m")]
+      VUNSPEC_LDR))]
+  ""
+  "ldr<sync_sfx>\t%0, %1"
+  [(set_attr "arch" "32,any")])
+
+(define_insn "arm_atomic_load_acquire<mode>"
+  [(set (match_operand:QHSI 0 "register_operand" "=r")
+    (unspec_volatile:QHSI
+      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q")]
       VUNSPEC_LDA))]
   "TARGET_HAVE_LDACQ"
-  {
-    if (aarch_mm_needs_acquire (operands[2]))
-      {
-	if (TARGET_THUMB1)
-	  return "lda<sync_sfx>\t%0, %1";
-	else
-	  return "lda<sync_sfx>%?\t%0, %1";
-      }
-    else
-      {
-	if (TARGET_THUMB1)
-	  return "ldr<sync_sfx>\t%0, %1";
-	else
-	  return "ldr<sync_sfx>%?\t%0, %1";
-      }
-  }
-  [(set_attr "arch" "32,v8mb,any")
-   (set_attr "predicable" "yes")])
+  "lda<sync_sfx>\t%0, %C1"
+)
 
-(define_insn "atomic_store<mode>"
-  [(set (match_operand:QHSI 0 "memory_operand" "=Q,Q,Q")
+(define_insn "arm_atomic_store<mode>"
+  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
+    (unspec_volatile:QHSI
+      [(match_operand:QHSI 1 "register_operand" "r,l")]
+      VUNSPEC_STR))]
+  ""
+  "str<sync_sfx>\t%1, %0";
+  [(set_attr "arch" "32,any")])
+
+(define_insn "arm_atomic_store_release<mode>"
+  [(set (match_operand:QHSI 0 "arm_sync_memory_operand" "=Q")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "general_operand" "r,r,l")
-       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
+      [(match_operand:QHSI 1 "register_operand" "r")]
       VUNSPEC_STL))]
   "TARGET_HAVE_LDACQ"
-  {
-    if (aarch_mm_needs_release (operands[2]))
-      {
-	if (TARGET_THUMB1)
-	  return "stl<sync_sfx>\t%1, %0";
-	else
-	  return "stl<sync_sfx>%?\t%1, %0";
-      }
-    else
-      {
-	if (TARGET_THUMB1)
-	  return "str<sync_sfx>\t%1, %0";
-	else
-	  return "str<sync_sfx>%?\t%1, %0";
-      }
-  }
-  [(set_attr "arch" "32,v8mb,any")
-   (set_attr "predicable" "yes")])
+  "stl<sync_sfx>\t%1, %C0")
+
+
+(define_expand "atomic_load<mode>"
+  [(match_operand:QHSI 0 "register_operand")		;; val out
+   (match_operand:QHSI 1 "arm_sync_memory_operand")	;; memory
+   (match_operand:SI 2 "const_int_operand")]		;; model
+  ""
+{
+  memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
+    {
+      emit_insn (gen_arm_atomic_load_acquire<mode> (operands[0], operands[1]));
+      DONE;
+    }
+
+  /* The seq_cst model needs a barrier before the load to block reordering with
+     earlier accesses.  */
+  if (is_mm_seq_cst (model))
+    expand_mem_thread_fence (model);
+
+  emit_insn (gen_arm_atomic_load<mode> (operands[0], operands[1]));
+
+  /* All non-relaxed models need a barrier after the load when load-acquire
+     instructions are not available.  */
+  if (!is_mm_relaxed (model))
+    expand_mem_thread_fence (model);
+
+  DONE;
+})
+
+(define_expand "atomic_store<mode>"
+  [(match_operand:QHSI 0 "arm_sync_memory_operand")	;; memory
+   (match_operand:QHSI 1 "register_operand")		;; store value
+   (match_operand:SI 2 "const_int_operand")]		;; model
+  ""
+{
+  memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+  if (TARGET_HAVE_LDACQ && !is_mm_relaxed (model))
+    {
+      emit_insn (gen_arm_atomic_store_release<mode> (operands[0], operands[1]));
+      DONE;
+    }
+
+  /* All non-relaxed models need a barrier after the load when load-acquire
+     instructions are not available.  */
+  if (!is_mm_relaxed (model))
+    expand_mem_thread_fence (model);
+
+  emit_insn (gen_arm_atomic_store<mode> (operands[0], operands[1]));
+
+  /* The seq_cst model needs a barrier after the store to block reordering with
+     later accesses.  */
+  if (is_mm_seq_cst (model))
+    expand_mem_thread_fence (model);
+
+  DONE;
+})
 
 ;; An LDRD instruction usable by the atomic_loaddi expander on LPAE targets
 
 (define_insn "arm_atomic_loaddi2_ldrd"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec_volatile:DI
-	  [(match_operand:DI 1 "arm_sync_memory_operand" "Q")]
+	  [(match_operand:DI 1 "memory_operand" "m")]
 	    VUNSPEC_LDRD_ATOMIC))]
   "ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_LPAE"
-  "ldrd%?\t%0, %H0, %C1"
-  [(set_attr "predicable" "yes")])
+  "ldrd\t%0, %H0, %1"
+)
 
 ;; There are three ways to expand this depending on the architecture
 ;; features available.  As for the barriers, a load needs a barrier
@@ -152,6 +194,11 @@  (define_expand "atomic_loaddi"
       DONE;
     }
 
+  /* The seq_cst model needs a barrier before the load to block reordering with
+     earlier accesses.  */
+  if (is_mm_seq_cst (model))
+    expand_mem_thread_fence (model);
+
   /* On LPAE targets LDRD and STRD accesses to 64-bit aligned
      locations are 64-bit single-copy atomic.  We still need barriers in the
      appropriate places to implement the ordering constraints.  */
@@ -160,7 +207,6 @@  (define_expand "atomic_loaddi"
   else
     emit_insn (gen_arm_load_exclusivedi (operands[0], operands[1]));
 
-
   /* All non-relaxed models need a barrier after the load when load-acquire
      instructions are not available.  */
   if (!is_mm_relaxed (model))
@@ -446,54 +492,42 @@  (define_insn_and_split "atomic_nand_fetch<mode>"
   [(set_attr "arch" "32,v8mb")])
 
 (define_insn "arm_load_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
 	    VUNSPEC_LL)))]
   "TARGET_HAVE_LDREXBH"
-  "@
-   ldrex<sync_sfx>%?\t%0, %C1
-   ldrex<sync_sfx>\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldrex<sync_sfx>\t%0, %C1"
+)
 
 (define_insn "arm_load_acquire_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
 	    VUNSPEC_LAX)))]
   "TARGET_HAVE_LDACQ"
-  "@
-   ldaex<sync_sfx>%?\\t%0, %C1
-   ldaex<sync_sfx>\\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldaex<sync_sfx>\\t%0, %C1"
+)
 
 (define_insn "arm_load_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LL))]
   "TARGET_HAVE_LDREX"
-  "@
-   ldrex%?\t%0, %C1
-   ldrex\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldrex\t%0, %C1"
+)
 
 (define_insn "arm_load_acquire_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LAX))]
   "TARGET_HAVE_LDACQ"
-  "@
-   ldaex%?\t%0, %C1
-   ldaex\t%0, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "ldaex\t%0, %C1"
+)
 
 (define_insn "arm_load_exclusivedi"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
@@ -501,8 +535,8 @@  (define_insn "arm_load_exclusivedi"
 	  [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LL))]
   "TARGET_HAVE_LDREXD"
-  "ldrexd%?\t%0, %H0, %C1"
-  [(set_attr "predicable" "yes")])
+  "ldrexd\t%0, %H0, %C1"
+)
 
 (define_insn "arm_load_acquire_exclusivedi"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
@@ -510,8 +544,8 @@  (define_insn "arm_load_acquire_exclusivedi"
 	  [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
 	  VUNSPEC_LAX))]
   "TARGET_HAVE_LDACQEXD && ARM_DOUBLEWORD_ALIGN"
-  "ldaexd%?\t%0, %H0, %C1"
-  [(set_attr "predicable" "yes")])
+  "ldaexd\t%0, %H0, %C1"
+)
 
 (define_insn "arm_store_exclusive<mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
@@ -530,14 +564,11 @@  (define_insn "arm_store_exclusive<mode>"
 	   Note that the 1st register always gets the
 	   lowest word in memory.  */
 	gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
-	return "strexd%?\t%0, %2, %H2, %C1";
+	return "strexd\t%0, %2, %H2, %C1";
       }
-    if (TARGET_THUMB1)
-      return "strex<sync_sfx>\t%0, %2, %C1";
-    else
-      return "strex<sync_sfx>%?\t%0, %2, %C1";
+    return "strex<sync_sfx>\t%0, %2, %C1";
   }
-  [(set_attr "predicable" "yes")])
+)
 
 (define_insn "arm_store_release_exclusivedi"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
@@ -550,20 +581,16 @@  (define_insn "arm_store_release_exclusivedi"
   {
     /* See comment in arm_store_exclusive<mode> above.  */
     gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
-    return "stlexd%?\t%0, %2, %H2, %C1";
+    return "stlexd\t%0, %2, %H2, %C1";
   }
-  [(set_attr "predicable" "yes")])
+)
 
 (define_insn "arm_store_release_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
+  [(set (match_operand:SI 0 "s_register_operand" "=&r")
 	(unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
-   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua,Ua")
+   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua")
 	(unspec_volatile:QHSI
-	  [(match_operand:QHSI 2 "s_register_operand" "r,r")]
+	  [(match_operand:QHSI 2 "s_register_operand" "r")]
 	  VUNSPEC_SLX))]
   "TARGET_HAVE_LDACQ"
-  "@
-   stlex<sync_sfx>%?\t%0, %2, %C1
-   stlex<sync_sfx>\t%0, %2, %C1"
-  [(set_attr "arch" "32,v8mb")
-   (set_attr "predicable" "yes")])
+  "stlex<sync_sfx>\t%0, %2, %C1")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index dccda283573208bd5db4f04c10ae9dbbfdda49dd..ba86ce0992f2b14a5e65ac09784b3fb3539de035 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -221,7 +221,9 @@  (define_c_enum "unspecv" [
   VUNSPEC_SC		; Represent a store-register-exclusive.
   VUNSPEC_LAX		; Represent a load-register-acquire-exclusive.
   VUNSPEC_SLX		; Represent a store-register-release-exclusive.
-  VUNSPEC_LDA		; Represent a store-register-acquire.
+  VUNSPEC_LDR		; Represent a load-register-relaxed.
+  VUNSPEC_LDA		; Represent a load-register-acquire.
+  VUNSPEC_STR		; Represent a store-register-relaxed.
   VUNSPEC_STL		; Represent a store-register-release.
   VUNSPEC_GET_FPSCR	; Represent fetch of FPSCR content.
   VUNSPEC_SET_FPSCR	; Represent assign of FPSCR content.
diff --git a/gcc/testsuite/gcc.target/arm/pr111235.c b/gcc/testsuite/gcc.target/arm/pr111235.c
new file mode 100644
index 0000000000000000000000000000000000000000..923b231afa888d326bcdc0ecabbcf8ba223d365a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr111235.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -O" } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-add-options arm_arch_v7a } */
+
+int t0 (int *p, int x)
+{
+  if (x > 100)
+    x = atomic_load_explicit (p, memory_order_relaxed);
+  return x + 1;
+}
+
+long long t1 (long long *p, int x)
+{
+  if (x > 100)
+    x = atomic_load_explicit (p, memory_order_relaxed);
+  return x + 1;
+}
+
+void t2 (int *p, int x)
+{
+  if (x > 100)
+    atomic_store_explicit (p, x, memory_order_relaxed);
+}
+
+void t3 (long long *p, long long x)
+{
+  if (x > 100)
+    atomic_store_explicit (p, x, memory_order_relaxed);
+}
+
+
+/* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
+