diff mbox

powerpc: Use lwsync on 64bit

Message ID 20170213102220.31ba7d56@kryten
State New
Headers show

Commit Message

Anton Blanchard Feb. 12, 2017, 11:22 p.m. UTC
Either an isync or an lwsync can be used as an acquire barrier after
a larx/stcx/bne sequence. All 64bit CPUs support lwsync and since the
isync instruction has other side effects that we don't need, use lwsync.

2017-02-12  Anton Blanchard  <anton@samba.org>

	* sysdeps/powerpc/atomic-machine.h: Allow __ARCH_ACQ_INSTR to be
	overridden.
	* sysdeps/powerpc/powerpc64/atomic-machine.h: define __ARCH_ACQ_INSTR

Comments

Tulio Magno Quites Machado Filho Feb. 13, 2017, 1:42 p.m. UTC | #1
Anton Blanchard <anton@samba.org> writes:

> Either an isync or an lwsync can be used as an acquire barrier after
> a larx/stcx/bne sequence. All 64bit CPUs support lwsync and since the
> isync instruction has other side effects that we don't need, use lwsync.
>
> 2017-02-12  Anton Blanchard  <anton@samba.org>
>
> 	* sysdeps/powerpc/atomic-machine.h: Allow __ARCH_ACQ_INSTR to be
> 	overridden.
> 	* sysdeps/powerpc/powerpc64/atomic-machine.h: define __ARCH_ACQ_INSTR

LGTM.
Adhemerval Zanella Netto Feb. 13, 2017, 2:23 p.m. UTC | #2
On 13/02/2017 11:42, Tulio Magno Quites Machado Filho wrote:
> Anton Blanchard <anton@samba.org> writes:
> 
>> Either an isync or an lwsync can be used as an acquire barrier after
>> a larx/stcx/bne sequence. All 64bit CPUs support lwsync and since the
>> isync instruction has other side effects that we don't need, use lwsync.
>>
>> 2017-02-12  Anton Blanchard  <anton@samba.org>
>>
>> 	* sysdeps/powerpc/atomic-machine.h: Allow __ARCH_ACQ_INSTR to be
>> 	overridden.
>> 	* sysdeps/powerpc/powerpc64/atomic-machine.h: define __ARCH_ACQ_INSTR
> 
> LGTM.
> 

I do not think this is correct for all the primitives that use
__ARCH_ACQ_INSTR.  For instance __arch_atomic_exchange_32_acq is defined as

#define __arch_atomic_exchange_32_acq(mem, value)                             \
  ({                                                                          \
    __typeof (*mem) __val;                                                    \
    __asm __volatile (                                                        \
                      "1:       lwarx   %0,0,%2" MUTEX_HINT_ACQ "\n"          \
                      "         stwcx.  %3,0,%2\n"                            \
                      "         bne-    1b\n"                                 \
                      "   " __ARCH_ACQ_INSTR                                  \
                      : "=&r" (__val), "=m" (*mem)                            \
                      : "b" (mem), "r" (value), "m" (*mem)                    \
                      : "cr0", "memory");                                     \
    __val;                                                                    \
  })


Which is analogous to C11:

#include <stdatomic.h>
#include <stdint.h>

uint32_t foo (uint32_t *x)
{
  return atomic_exchange_explicit (x, 0, memory_order_acquire);
}

And GCC 6.2.1 uses and 'isync' a memory barrier.  It is also on par with [1]
which defines acquire semantics to require 'isync' instructions.


[1] https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
cseo Feb. 13, 2017, 2:24 p.m. UTC | #3
On 2/12/17, 9:22 PM, "Anton Blanchard" <libc-alpha-owner@sourceware.org on behalf of anton@samba.org> wrote:

    Either an isync or an lwsync can be used as an acquire barrier after
    a larx/stcx/bne sequence. All 64bit CPUs support lwsync and since the
    isync instruction has other side effects that we don't need, use lwsync.
    

OK


--
Carlos Eduardo Seo
Software Engineer - Linux on Power Toolchain
cseo@linux.vnet.ibm.com
Tulio Magno Quites Machado Filho Feb. 13, 2017, 3:25 p.m. UTC | #4
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> I do not think this is correct for all the primitives that use
> __ARCH_ACQ_INSTR.  For instance __arch_atomic_exchange_32_acq is defined as
>
> #define __arch_atomic_exchange_32_acq(mem, value)                             \
>   ({                                                                          \
>     __typeof (*mem) __val;                                                    \
>     __asm __volatile (                                                        \
>                       "1:       lwarx   %0,0,%2" MUTEX_HINT_ACQ "\n"          \
>                       "         stwcx.  %3,0,%2\n"                            \
>                       "         bne-    1b\n"                                 \
>                       "   " __ARCH_ACQ_INSTR                                  \
>                       : "=&r" (__val), "=m" (*mem)                            \
>                       : "b" (mem), "r" (value), "m" (*mem)                    \
>                       : "cr0", "memory");                                     \
>     __val;                                                                    \
>   })
>
>
> Which is analogous to C11:
>
> #include <stdatomic.h>
> #include <stdint.h>
>
> uint32_t foo (uint32_t *x)
> {
>   return atomic_exchange_explicit (x, 0, memory_order_acquire);
> }
>
> And GCC 6.2.1 uses and 'isync' a memory barrier.  It is also on par with [1]
> which defines acquire semantics to require 'isync' instructions.
>
>
> [1] https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Good point.
Per this document, it would also mess with the spinlock.

Anton, are you trying to fix a more specific issue?
Torvald Riegel Feb. 13, 2017, 7:44 p.m. UTC | #5
On Mon, 2017-02-13 at 12:23 -0200, Adhemerval Zanella wrote:
> 
> On 13/02/2017 11:42, Tulio Magno Quites Machado Filho wrote:
> > Anton Blanchard <anton@samba.org> writes:
> > 
> >> Either an isync or an lwsync can be used as an acquire barrier after
> >> a larx/stcx/bne sequence. All 64bit CPUs support lwsync and since the
> >> isync instruction has other side effects that we don't need, use lwsync.
> >>
> >> 2017-02-12  Anton Blanchard  <anton@samba.org>
> >>
> >> 	* sysdeps/powerpc/atomic-machine.h: Allow __ARCH_ACQ_INSTR to be
> >> 	overridden.
> >> 	* sysdeps/powerpc/powerpc64/atomic-machine.h: define __ARCH_ACQ_INSTR
> > 
> > LGTM.
> > 
> 
> I do not think this is correct for all the primitives that use
> __ARCH_ACQ_INSTR.  For instance __arch_atomic_exchange_32_acq is defined as
> 
> #define __arch_atomic_exchange_32_acq(mem, value)                             \
>   ({                                                                          \
>     __typeof (*mem) __val;                                                    \
>     __asm __volatile (                                                        \
>                       "1:       lwarx   %0,0,%2" MUTEX_HINT_ACQ "\n"          \
>                       "         stwcx.  %3,0,%2\n"                            \
>                       "         bne-    1b\n"                                 \
>                       "   " __ARCH_ACQ_INSTR                                  \
>                       : "=&r" (__val), "=m" (*mem)                            \
>                       : "b" (mem), "r" (value), "m" (*mem)                    \
>                       : "cr0", "memory");                                     \
>     __val;                                                                    \
>   })
> 
> 
> Which is analogous to C11:
> 
> #include <stdatomic.h>
> #include <stdint.h>
> 
> uint32_t foo (uint32_t *x)
> {
>   return atomic_exchange_explicit (x, 0, memory_order_acquire);
> }
> 
> And GCC 6.2.1 uses and 'isync' a memory barrier.  It is also on par with [1]
> which defines acquire semantics to require 'isync' instructions.
> 
> 
> [1] https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

I don't know enough about the details of powerpc to comment in detail.
However, I'd like to make a few general comments:

* We should strive to be as close to C11 atomics as produced by the
compiler.  For example, if GCC issues an isync instead of lwsync, we
should be fine doing the same.  If you think GCC is generating
inefficient code, please take the change to GCC first and reach
consensus for a change there.  We'll just use GCC's atomics eventually
(though MUTEX_HINT_ACQ may be a reason for a special case), so applying
any change in GCC is necessary anyway to make such a change effective.

* Any changes in the atomics implementation should be reviewed carefully
by people deeply familiar with the HW memory models and how they map to
C11.  If in doubt, take your time.
diff mbox

Patch

diff --git a/sysdeps/powerpc/atomic-machine.h b/sysdeps/powerpc/atomic-machine.h
index 0a58203..31d67fa 100644
--- a/sysdeps/powerpc/atomic-machine.h
+++ b/sysdeps/powerpc/atomic-machine.h
@@ -57,7 +57,9 @@  typedef uintmax_t uatomic_max_t;
 # define __ARCH_ACQ_INSTR	""
 # define __ARCH_REL_INSTR	""
 #else
-# define __ARCH_ACQ_INSTR	"isync"
+# ifndef __ARCH_ACQ_INSTR
+#  define __ARCH_ACQ_INSTR	"isync"
+# endif
 # ifndef __ARCH_REL_INSTR
 #  define __ARCH_REL_INSTR	"sync"
 # endif
diff --git a/sysdeps/powerpc/powerpc64/atomic-machine.h b/sysdeps/powerpc/powerpc64/atomic-machine.h
index 40c308e..76c586a 100644
--- a/sysdeps/powerpc/powerpc64/atomic-machine.h
+++ b/sysdeps/powerpc/powerpc64/atomic-machine.h
@@ -230,6 +230,7 @@ 
  * "light weight" sync can also be used for the release barrier.
  */
 #ifndef UP
+# define __ARCH_ACQ_INSTR	"lwsync"
 # define __ARCH_REL_INSTR	"lwsync"
 #endif
 #define atomic_write_barrier()	__asm ("lwsync" ::: "memory")