diff mbox

[1/3,AArch64] Strengthen barriers for sync-fetch-op builtins.

Message ID 555E004C.3060704@arm.com
State New
Headers show

Commit Message

Matthew Wahab May 21, 2015, 3:57 p.m. UTC
On Aarch64, the __sync builtins are implemented using the __atomic operations
and barriers. This makes the the __sync builtins inconsistent with their
documentation which requires stronger barriers than those for the __atomic
builtins.

The difference between __sync and __atomic builtins is that the restrictions
imposed by a __sync operation's barrier apply to all memory references while the
restrictions of an __atomic operation's barrier only need to apply to a
subset. This affects Aarch64 in particular because, although its implementation
of __atomic builtins is correct, the barriers generated are too weak for the
__sync builtins.

The affected __sync builtins are the __sync_fetch_and_op (and
__sync_op_and_fetch) functions, __sync_compare_and_swap and
__sync_lock_test_and_set. This and a following patch modifies the code generated
for these functions to weaken initial load-acquires to a simple load and to add
a final fence to prevent code-hoisting. The last patch will add tests for the
code generated by the Aarch64 backend for the __sync builtins.

- Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
   __sync_*_compare_and_swap

   [load-acquire; code; store-release]
   becomes
   [load; code ; store-release; fence].

- Acquire barriers:  __sync_lock_test_and_set

   [load-acquire; code; store]
   becomes
   [load; code; store; fence]

The code generated for release barriers and for the __atomic builtins is
unchanged.

This patch changes the code generated for __sync_fetch_and_<op> and
__sync_<op>_and_fetch builtins.

Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

gcc/
2015-05-21  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
	(aarch64_split_atomic_op): Check for __sync memory models, emit
	appropriate initial and final barriers.

Comments

Ramana Radhakrishnan May 22, 2015, 11:26 a.m. UTC | #1
>
> Ok for trunk?

I can't approve but do you mind taking care of -march=armv8-a in the
arm backend too as that would have the same issues.

Ramana

> Matthew
>
> gcc/
> 2015-05-21  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
>         (aarch64_split_atomic_op): Check for __sync memory models, emit
>         appropriate initial and final barriers.
>
>
Matthew Wahab May 22, 2015, 11:47 a.m. UTC | #2
On 22/05/15 12:26, Ramana Radhakrishnan wrote:
>>
>> Ok for trunk?
>
> I can't approve but do you mind taking care of -march=armv8-a in the
> arm backend too as that would have the same issues.
>

Will do,
Matthew
James Greenhalgh May 26, 2015, 9:32 a.m. UTC | #3
On Thu, May 21, 2015 at 04:57:00PM +0100, Matthew Wahab wrote:
> On Aarch64, the __sync builtins are implemented using the __atomic operations
> and barriers. This makes the the __sync builtins inconsistent with their
> documentation which requires stronger barriers than those for the __atomic
> builtins.

<snip>

> Ok for trunk?

Close, but I have some comments on style.

Please tie this to the PR which was open in the ChangLog entry.

> 
> gcc/
> 2015-05-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.

If I was being picky, this should be something like
aarch64_emit_sync_op_epilogue , but I don't want to bikeshed too much.


> 	(aarch64_split_atomic_op): Check for __sync memory models, emit
> 	appropriate initial and final barriers.

I don't see any new initial barriers. I think you are referring to
relaxing the ldaxr to an ldxr for __sync primitives, in which case, say
that.

> From 2092902d2738b0c24a6272e0b3480bb9cffd275c Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Fri, 15 May 2015 09:26:28 +0100
> Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin.
> 
> Change-Id: I3342a572d672163ffc703e4e51603744680334fc
> ---
>  gcc/config/aarch64/aarch64.c | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 7f0cc0d..778571f 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9249,6 +9249,22 @@ aarch64_expand_compare_and_swap (rtx operands[])
>    emit_insn (gen_rtx_SET (bval, x));
>  }
>  
> +/* Emit a post-operation barrier.  */

This comment could do with some more detail. What is a post-operation
barrier? When do we need one? What is the MODEL parameter?

> +static void
> +aarch64_emit_post_barrier (enum memmodel model)
> +{
> +  const enum memmodel base_model = memmodel_base (model);
> +
> +  if (is_mm_sync (model)
> +      && (base_model == MEMMODEL_ACQUIRE
> +	  || base_model == MEMMODEL_ACQ_REL
> +	  || base_model == MEMMODEL_SEQ_CST))
> +    {
> +      emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST)));
> +    }
> +}
> +
>  /* Split a compare and swap pattern.  */
>  
>  void
> @@ -9311,12 +9327,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
>  {
>    machine_mode mode = GET_MODE (mem);
>    machine_mode wmode = (mode == DImode ? DImode : SImode);
> +  const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
> +  const bool is_sync = is_mm_sync (model);
> +  rtx load_model_rtx = model_rtx;
>    rtx_code_label *label;
>    rtx x;
>  
>    label = gen_label_rtx ();
>    emit_label (label);
>  
> +  /* A __sync operation will emit a final fence to stop code hoisting, so the

Can we pick a consistent terminology between fence/barrier? They are
currently used interchangeably, but I think we generally prefer barrier
in the AArch64 port.

> +     load can be relaxed.  */
> +  if (is_sync)
> +    load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
> +
>    if (new_out)
>      new_out = gen_lowpart (wmode, new_out);
>    if (old_out)
> @@ -9325,7 +9349,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
>      old_out = new_out;
>    value = simplify_gen_subreg (wmode, value, mode, 0);
>  
> -  aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
> +  aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx);

To my mind, these two hunks would be marginally easier to follow if
we combined them, as so:

  /* A __sync operation will emit a final barrier to stop code hoisting,
     so the load can be relaxed.  */
  if (is_sync)
    aarch64_emit_load_exclusive (mode, old_out,
				 mem, GEN_INT (MEMMODEL_RELAXED));
  else
    aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);

it is just one less piece of information to juggle when thinking through
what we are emitting here.

>  
>    switch (code)
>      {
> @@ -9361,6 +9385,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
>    x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
>  			    gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
>    aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> +
> +  /* Emit any fence needed for a __sync operation.  */
> +  if (is_sync)
> +    aarch64_emit_post_barrier (model);
>  }

Cheers,
James
diff mbox

Patch

From 2092902d2738b0c24a6272e0b3480bb9cffd275c Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Fri, 15 May 2015 09:26:28 +0100
Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I3342a572d672163ffc703e4e51603744680334fc
---
 gcc/config/aarch64/aarch64.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7f0cc0d..778571f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9249,6 +9249,22 @@  aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Emit a post-operation barrier.  */
+
+static void
+aarch64_emit_post_barrier (enum memmodel model)
+{
+  const enum memmodel base_model = memmodel_base (model);
+
+  if (is_mm_sync (model)
+      && (base_model == MEMMODEL_ACQUIRE
+	  || base_model == MEMMODEL_ACQ_REL
+	  || base_model == MEMMODEL_SEQ_CST))
+    {
+      emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+    }
+}
+
 /* Split a compare and swap pattern.  */
 
 void
@@ -9311,12 +9327,20 @@  aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 {
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
+  const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
+  const bool is_sync = is_mm_sync (model);
+  rtx load_model_rtx = model_rtx;
   rtx_code_label *label;
   rtx x;
 
   label = gen_label_rtx ();
   emit_label (label);
 
+  /* A __sync operation will emit a final fence to stop code hoisting, so the
+     load can be relaxed.  */
+  if (is_sync)
+    load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+
   if (new_out)
     new_out = gen_lowpart (wmode, new_out);
   if (old_out)
@@ -9325,7 +9349,7 @@  aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
     old_out = new_out;
   value = simplify_gen_subreg (wmode, value, mode, 0);
 
-  aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+  aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx);
 
   switch (code)
     {
@@ -9361,6 +9385,10 @@  aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			    gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+  /* Emit any fence needed for a __sync operation.  */
+  if (is_sync)
+    aarch64_emit_post_barrier (model);
 }
 
 static void
-- 
1.9.1