diff mbox

[5/5] atomic: base mb_read/mb_set on load-acquire and store-release

Message ID 1476107947-31430-6-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Oct. 10, 2016, 1:59 p.m. UTC
This introduces load-acquire and store-release operations in QEMU.
For now, just use them as an implementation detail of atomic_mb_read
and atomic_mb_set.

Since docs/atomics.txt documents that atomic_mb_read only synchronizes
with an atomic_mb_set of the same variable, we can use the new implementation
everywhere instead of seq-cst loads and stores.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/atomics.txt      |  5 +--
 include/qemu/atomic.h | 93 +++++++++++++++++----------------------------------
 2 files changed, 34 insertions(+), 64 deletions(-)

Comments

Alex Bennée Oct. 12, 2016, 9:28 a.m. UTC | #1
Paolo Bonzini <pbonzini@redhat.com> writes:

> This introduces load-acquire and store-release operations in QEMU.
> For now, just use them as an implementation detail of atomic_mb_read
> and atomic_mb_set.
>
> Since docs/atomics.txt documents that atomic_mb_read only synchronizes
> with an atomic_mb_set of the same variable, we can use the new implementation
> everywhere instead of seq-cst loads and stores.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
<snip>

> +/* This is more efficient than a store plus a fence.  */
> +#if defined(__i386__) || defined(__x86_64__) || defined(__s390x__)
> +#define atomic_mb_set(ptr, i)  ((void)atomic_xchg(ptr, i))
> +#endif

Is this working around a compiler issue? Shouldn't it already be using
the best instructions for the constraint?

> +
> +#ifndef atomic_mb_read
> +#define atomic_mb_read(ptr)                             \
> +    atomic_load_acquire(ptr)
> +#endif
> +
> +#ifndef atomic_mb_set
> +#define atomic_mb_set(ptr, i)  do {                     \
> +    atomic_store_release(ptr, i);                       \
> +    smp_mb();                                           \
> +} while(0)
> +#endif
> +
>  #endif /* QEMU_ATOMIC_H */


--
Alex Bennée
Paolo Bonzini Oct. 12, 2016, 9:30 a.m. UTC | #2
On 12/10/2016 11:28, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> This introduces load-acquire and store-release operations in QEMU.
>> For now, just use them as an implementation detail of atomic_mb_read
>> and atomic_mb_set.
>>
>> Since docs/atomics.txt documents that atomic_mb_read only synchronizes
>> with an atomic_mb_set of the same variable, we can use the new implementation
>> everywhere instead of seq-cst loads and stores.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
> <snip>
> 
>> +/* This is more efficient than a store plus a fence.  */
>> +#if defined(__i386__) || defined(__x86_64__) || defined(__s390x__)
>> +#define atomic_mb_set(ptr, i)  ((void)atomic_xchg(ptr, i))
>> +#endif
> 
> Is this working around a compiler issue? Shouldn't it already be using
> the best instructions for the constraint?

Store release + memory barrier can be compiled into a seqcst store, and
is more efficient if you have a single instruction for that purpose, but
I don't know of any compiler that does it.

Paolo

>> +
>> +#ifndef atomic_mb_read
>> +#define atomic_mb_read(ptr)                             \
>> +    atomic_load_acquire(ptr)
>> +#endif
>> +
>> +#ifndef atomic_mb_set
>> +#define atomic_mb_set(ptr, i)  do {                     \
>> +    atomic_store_release(ptr, i);                       \
>> +    smp_mb();                                           \
>> +} while(0)
>> +#endif
>> +
>>  #endif /* QEMU_ATOMIC_H */
> 
> 
> --
> Alex Bennée
>
Pranith Kumar Oct. 20, 2016, 6:36 p.m. UTC | #3
Hi,

On Mon, Oct 10, 2016 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> This introduces load-acquire and store-release operations in QEMU.
> For now, just use them as an implementation detail of atomic_mb_read
> and atomic_mb_set.
>
> Since docs/atomics.txt documents that atomic_mb_read only synchronizes
> with an atomic_mb_set of the same variable, we can use the new implementation
> everywhere instead of seq-cst loads and stores.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  docs/atomics.txt      |  5 +--
>  include/qemu/atomic.h | 93 +++++++++++++++++----------------------------------
>  2 files changed, 34 insertions(+), 64 deletions(-)
>
> diff --git a/docs/atomics.txt b/docs/atomics.txt
> index 5cd8e32..9a1f8aa 100644
> --- a/docs/atomics.txt
> +++ b/docs/atomics.txt
> @@ -374,8 +374,9 @@ and memory barriers, and the equivalents in QEMU:
>    note that smp_store_mb() is a little weaker than atomic_mb_set().
>    atomic_mb_read() compiles to the same instructions as Linux's
>    smp_load_acquire(), but this should be treated as an implementation
> -  detail.  If required, QEMU might later add atomic_load_acquire() and
> -  atomic_store_release() macros.
> +  detail.  QEMU does have atomic_load_acquire() and atomic_store_release()
> +  macros, but for now they are only used within atomic.h.  This may
> +  change in the future.
>
>
>  SOURCES
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index b108df0..d940e98 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -135,44 +135,18 @@
>      __atomic_store_n(ptr, i, __ATOMIC_RELEASE);       \
>  } while(0)
>
> -/* atomic_mb_read/set semantics map Java volatile variables. They are
> - * less expensive on some platforms (notably POWER & ARMv7) than fully
> - * sequentially consistent operations.
> - *
> - * As long as they are used as paired operations they are safe to
> - * use. See docs/atomic.txt for more discussion.
> - */
> -
> -#if defined(_ARCH_PPC)
> -#define atomic_mb_read(ptr)                             \
> -    ({                                                  \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
> -    typeof_strip_qual(*ptr) _val;                       \
> -     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
> -     smp_mb_acquire();                                  \
> -    _val;                                               \
> -    })
> -
> -#define atomic_mb_set(ptr, i)  do {                     \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
> -    smp_mb_release();                                   \
> -    __atomic_store_n(ptr, i, __ATOMIC_RELAXED);         \
> -    smp_mb();                                           \
> -} while(0)
> -#else
> -#define atomic_mb_read(ptr)                             \
> +#define atomic_load_acquire(ptr)                        \
>      ({                                                  \
>      QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
>      typeof_strip_qual(*ptr) _val;                       \
> -    __atomic_load(ptr, &_val, __ATOMIC_SEQ_CST);        \
> +    __atomic_load(ptr, &_val, __ATOMIC_ACQUIRE);        \
>      _val;                                               \
>      })
>

Is there any reason we are not using __atomic_load_n() here?

--
Pranith
diff mbox

Patch

diff --git a/docs/atomics.txt b/docs/atomics.txt
index 5cd8e32..9a1f8aa 100644
--- a/docs/atomics.txt
+++ b/docs/atomics.txt
@@ -374,8 +374,9 @@  and memory barriers, and the equivalents in QEMU:
   note that smp_store_mb() is a little weaker than atomic_mb_set().
   atomic_mb_read() compiles to the same instructions as Linux's
   smp_load_acquire(), but this should be treated as an implementation
-  detail.  If required, QEMU might later add atomic_load_acquire() and
-  atomic_store_release() macros.
+  detail.  QEMU does have atomic_load_acquire() and atomic_store_release()
+  macros, but for now they are only used within atomic.h.  This may
+  change in the future.
 
 
 SOURCES
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index b108df0..d940e98 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -135,44 +135,18 @@ 
     __atomic_store_n(ptr, i, __ATOMIC_RELEASE);       \
 } while(0)
 
-/* atomic_mb_read/set semantics map Java volatile variables. They are
- * less expensive on some platforms (notably POWER & ARMv7) than fully
- * sequentially consistent operations.
- *
- * As long as they are used as paired operations they are safe to
- * use. See docs/atomic.txt for more discussion.
- */
-
-#if defined(_ARCH_PPC)
-#define atomic_mb_read(ptr)                             \
-    ({                                                  \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
-    typeof_strip_qual(*ptr) _val;                       \
-     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
-     smp_mb_acquire();                                  \
-    _val;                                               \
-    })
-
-#define atomic_mb_set(ptr, i)  do {                     \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
-    smp_mb_release();                                   \
-    __atomic_store_n(ptr, i, __ATOMIC_RELAXED);         \
-    smp_mb();                                           \
-} while(0)
-#else
-#define atomic_mb_read(ptr)                             \
+#define atomic_load_acquire(ptr)                        \
     ({                                                  \
     QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
     typeof_strip_qual(*ptr) _val;                       \
-    __atomic_load(ptr, &_val, __ATOMIC_SEQ_CST);        \
+    __atomic_load(ptr, &_val, __ATOMIC_ACQUIRE);        \
     _val;                                               \
     })
 
-#define atomic_mb_set(ptr, i)  do {                     \
+#define atomic_store_release(ptr, i)  do {              \
     QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
-    __atomic_store_n(ptr, i, __ATOMIC_SEQ_CST);         \
+    __atomic_store_n(ptr, i, __ATOMIC_RELEASE);         \
 } while(0)
-#endif
 
 
 /* All the remaining operations are fully sequentially consistent */
@@ -248,11 +222,6 @@ 
  */
 #define atomic_xchg(ptr, i)    (barrier(), __sync_lock_test_and_set(ptr, i))
 
-/*
- * Load/store with Java volatile semantics.
- */
-#define atomic_mb_set(ptr, i)  ((void)atomic_xchg(ptr, i))
-
 #elif defined(_ARCH_PPC)
 
 /*
@@ -343,41 +312,16 @@ 
     atomic_set(ptr, i);                           \
 } while (0)
 
-/* These have the same semantics as Java volatile variables.
- * See http://gee.cs.oswego.edu/dl/jmm/cookbook.html:
- * "1. Issue a StoreStore barrier (wmb) before each volatile store."
- *  2. Issue a StoreLoad barrier after each volatile store.
- *     Note that you could instead issue one before each volatile load, but
- *     this would be slower for typical programs using volatiles in which
- *     reads greatly outnumber writes. Alternatively, if available, you
- *     can implement volatile store as an atomic instruction (for example
- *     XCHG on x86) and omit the barrier. This may be more efficient if
- *     atomic instructions are cheaper than StoreLoad barriers.
- *  3. Issue LoadLoad and LoadStore barriers after each volatile load."
- *
- * If you prefer to think in terms of "pairing" of memory barriers,
- * an atomic_mb_read pairs with an atomic_mb_set.
- *
- * And for the few ia64 lovers that exist, an atomic_mb_read is a ld.acq,
- * while an atomic_mb_set is a st.rel followed by a memory barrier.
- *
- * These are a bit weaker than __atomic_load/store with __ATOMIC_SEQ_CST
- * (see docs/atomics.txt), and I'm not sure that __ATOMIC_ACQ_REL is enough.
- * Just always use the barriers manually by the rules above.
- */
-#define atomic_mb_read(ptr)    ({           \
+#define atomic_load_acquire(ptr)    ({      \
     typeof(*ptr) _val = atomic_read(ptr);   \
     smp_mb_acquire();                       \
     _val;                                   \
 })
 
-#ifndef atomic_mb_set
-#define atomic_mb_set(ptr, i)  do {         \
+#define atomic_store_release(ptr, i)  do {  \
     smp_mb_release();                       \
     atomic_set(ptr, i);                     \
-    smp_mb();                               \
 } while (0)
-#endif
 
 #ifndef atomic_xchg
 #if defined(__clang__)
@@ -414,4 +358,29 @@ 
 #define smp_rmb()   smp_mb_acquire()
 #endif
 
+/* atomic_mb_read/set semantics map Java volatile variables. They are
+ * less expensive on some platforms (notably POWER) than fully
+ * sequentially consistent operations.
+ *
+ * As long as they are used as paired operations they are safe to
+ * use. See docs/atomic.txt for more discussion.
+ */
+
+/* This is more efficient than a store plus a fence.  */
+#if defined(__i386__) || defined(__x86_64__) || defined(__s390x__)
+#define atomic_mb_set(ptr, i)  ((void)atomic_xchg(ptr, i))
+#endif
+
+#ifndef atomic_mb_read
+#define atomic_mb_read(ptr)                             \
+    atomic_load_acquire(ptr)
+#endif
+
+#ifndef atomic_mb_set
+#define atomic_mb_set(ptr, i)  do {                     \
+    atomic_store_release(ptr, i);                       \
+    smp_mb();                                           \
+} while(0)
+#endif
+
 #endif /* QEMU_ATOMIC_H */