diff mbox

[v2] Fix atomic_full_barrier on x86 and x86_64.

Message ID 1435066479.25759.79.camel@localhost.localdomain
State New
Headers show

Commit Message

Torvald Riegel June 23, 2015, 1:34 p.m. UTC
This is a revision of
https://sourceware.org/ml/libc-alpha/2014-10/msg00702.

This fixes BZ #17403 by defining atomic_full_barrier,
atomic_read_barrier, and atomic_write_barrier on x86 and x86_64.  A full
barrier is implemented through an atomic idempotent modification to the
stack and not through using mfence because the latter can supposedly be
somewhat slower due to having to provide stronger guarantees wrt
self-modifying code, for example.

Tested on x86_64-linux.

H.J., is this OK for x86 and x32?


2015-06-23  Torvald Riegel  <triegel@redhat.com>

	[BZ #17403]
	* sysdeps/x86_64/bits/atomic.h: (atomic_full_barrier,
	atomic_read_barrier, atomic_write_barrier): Define.
	* sysdeps/i386/i486/bits/atomic.h (atomic_full_barrier,
	atomic_read_barrier, atomic_write_barrier): Define.

Comments

H.J. Lu June 23, 2015, 3:45 p.m. UTC | #1
On Tue, Jun 23, 2015 at 6:34 AM, Torvald Riegel <triegel@redhat.com> wrote:
> This is a revision of
> https://sourceware.org/ml/libc-alpha/2014-10/msg00702.
>
> This fixes BZ #17403 by defining atomic_full_barrier,
> atomic_read_barrier, and atomic_write_barrier on x86 and x86_64.  A full
> barrier is implemented through an atomic idempotent modification to the
> stack and not through using mfence because the latter can supposedly be
> somewhat slower due to having to provide stronger guarantees wrt
> self-modifying code, for example.
>
> Tested on x86_64-linux.
>
> H.J., is this OK for x86 and x32?
>
>
> 2015-06-23  Torvald Riegel  <triegel@redhat.com>
>
>         [BZ #17403]
>         * sysdeps/x86_64/bits/atomic.h: (atomic_full_barrier,
>         atomic_read_barrier, atomic_write_barrier): Define.
>         * sysdeps/i386/i486/bits/atomic.h (atomic_full_barrier,
>         atomic_read_barrier, atomic_write_barrier): Define.
>
>

It looks good to me.

Thanks.
Rich Felker June 23, 2015, 4:03 p.m. UTC | #2
On Tue, Jun 23, 2015 at 03:34:39PM +0200, Torvald Riegel wrote:
> This is a revision of
> https://sourceware.org/ml/libc-alpha/2014-10/msg00702.
> 
> This fixes BZ #17403 by defining atomic_full_barrier,
> atomic_read_barrier, and atomic_write_barrier on x86 and x86_64.  A full
> barrier is implemented through an atomic idempotent modification to the
> stack and not through using mfence because the latter can supposedly be
> somewhat slower due to having to provide stronger guarantees wrt
> self-modifying code, for example.

Small nit in the description: "idempotent" means n applications of the
operation are the same as one application. Here the intended meaning
is that the operation makes no change at all, i.e. n applications are
equivalent to 0 applications. I'm not sure what the right word is
though.

Rich
diff mbox

Patch

commit ae5defc04e383b92b7828833783b36d1d49c188c
Author: Torvald Riegel <triegel@redhat.com>
Date:   Wed Oct 29 10:34:36 2014 +0100

    Fix atomic_full_barrier on x86 and x86_64.
    
    This fixes BZ #17403 by defining atomic_full_barrier,
    atomic_read_barrier, and atomic_write_barrier on x86 and x86_64.  A full
    barrier is implemented through an atomic idempotent modification to the
    stack and not through using mfence because the latter can supposedly be
    somewhat slower due to having to provide stronger guarantees wrt.
    self-modifying code, for example.

diff --git a/NEWS b/NEWS
index 4e21210..323ea52 100644
--- a/NEWS
+++ b/NEWS
@@ -12,18 +12,18 @@  Version 2.22
   438, 4719, 6792, 13028, 13064, 14094, 14841, 14906, 14958, 15319, 15467,
   15790, 15969, 16159, 16339, 16350, 16351, 16352, 16353, 16361, 16512,
   16560, 16704, 16783, 16850, 17053, 17090, 17195, 17269, 17293, 17322,
-  17523, 17542, 17569, 17581, 17588, 17596, 17620, 17621, 17628, 17631,
-  17692, 17711, 17715, 17776, 17779, 17792, 17836, 17912, 17916, 17930,
-  17932, 17944, 17949, 17964, 17965, 17967, 17969, 17978, 17987, 17991,
-  17996, 17998, 17999, 18007, 18019, 18020, 18029, 18030, 18032, 18034,
-  18036, 18038, 18039, 18042, 18043, 18046, 18047, 18049, 18068, 18080,
-  18093, 18100, 18104, 18110, 18111, 18116, 18125, 18128, 18138, 18185,
-  18196, 18197, 18206, 18210, 18211, 18217, 18220, 18221, 18234, 18244,
-  18247, 18287, 18319, 18324, 18333, 18346, 18397, 18409, 18410, 18412,
-  18418, 18422, 18434, 18444, 18468, 18469, 18470, 18479, 18483, 18495,
-  18496, 18497, 18498, 18507, 18512, 18513, 18519, 18520, 18522, 18527,
-  18528, 18529, 18530, 18532, 18533, 18534, 18536, 18539, 18540, 18542,
-  18544, 18545, 18546, 18547, 18553, 18558, 18569.
+  17403, 17523, 17542, 17569, 17581, 17588, 17596, 17620, 17621, 17628,
+  17631, 17692, 17711, 17715, 17776, 17779, 17792, 17836, 17912, 17916,
+  17930, 17932, 17944, 17949, 17964, 17965, 17967, 17969, 17978, 17987,
+  17991, 17996, 17998, 17999, 18007, 18019, 18020, 18029, 18030, 18032,
+  18034, 18036, 18038, 18039, 18042, 18043, 18046, 18047, 18049, 18068,
+  18080, 18093, 18100, 18104, 18110, 18111, 18116, 18125, 18128, 18138,
+  18185, 18196, 18197, 18206, 18210, 18211, 18217, 18220, 18221, 18234,
+  18244, 18247, 18287, 18319, 18324, 18333, 18346, 18397, 18409, 18410,
+  18412, 18418, 18422, 18434, 18444, 18468, 18469, 18470, 18479, 18483,
+  18495, 18496, 18497, 18498, 18507, 18512, 18513, 18519, 18520, 18522,
+  18527, 18528, 18529, 18530, 18532, 18533, 18534, 18536, 18539, 18540,
+  18542, 18544, 18545, 18546, 18547, 18553, 18558, 18569.
 
 * Cache information can be queried via sysconf() function on s390 e.g. with
   _SC_LEVEL1_ICACHE_SIZE as argument.
diff --git a/sysdeps/i386/i486/bits/atomic.h b/sysdeps/i386/i486/bits/atomic.h
index 01c2b94..59165be 100644
--- a/sysdeps/i386/i486/bits/atomic.h
+++ b/sysdeps/i386/i486/bits/atomic.h
@@ -535,3 +535,10 @@  typedef uintmax_t uatomic_max_t;
 #define atomic_or(mem, mask) __arch_or_body (LOCK_PREFIX, mem, mask)
 
 #define catomic_or(mem, mask) __arch_or_body (__arch_cprefix, mem, mask)
+
+/* We don't use mfence because it is supposedly slower due to having to
+   provide stronger guarantees (e.g., regarding self-modifying code).  */
+#define atomic_full_barrier() \
+    __asm __volatile (LOCK_PREFIX "orl $0, (%%esp)" ::: "memory")
+#define atomic_read_barrier() __asm ("" ::: "memory")
+#define atomic_write_barrier() __asm ("" ::: "memory")
diff --git a/sysdeps/x86_64/bits/atomic.h b/sysdeps/x86_64/bits/atomic.h
index 5d08146..203d92c 100644
--- a/sysdeps/x86_64/bits/atomic.h
+++ b/sysdeps/x86_64/bits/atomic.h
@@ -472,3 +472,10 @@  typedef uintmax_t uatomic_max_t;
 #define atomic_or(mem, mask) __arch_or_body (LOCK_PREFIX, mem, mask)
 
 #define catomic_or(mem, mask) __arch_or_body (__arch_cprefix, mem, mask)
+
+/* We don't use mfence because it is supposedly slower due to having to
+   provide stronger guarantees (e.g., regarding self-modifying code).  */
+#define atomic_full_barrier() \
+    __asm __volatile (LOCK_PREFIX "orl $0, (%%rsp)" ::: "memory")
+#define atomic_read_barrier() __asm ("" ::: "memory")
+#define atomic_write_barrier() __asm ("" ::: "memory")