diff mbox

[1/3] arch: Introduce load_acquire() and store_release()

Message ID 20141113192723.12579.25343.stgit@ahduyck-server
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Alexander Duyck Nov. 13, 2014, 7:27 p.m. UTC
It is common for device drivers to make use of acquire/release semantics
when dealing with descriptors stored in device memory.  On reviewing the
documentation and code for smp_load_acquire() and smp_store_release() as
well as reviewing an IBM website that goes over the use of PowerPC barriers
at http://www.ibm.com/developerworks/systems/articles/powerpc.html it
occurred to me that the same code could likely be applied to device drivers.

As a result this patch introduces load_acquire() and store_release().  The
load_acquire() function can be used in the place of situations where a test
for ownership must be followed by a memory barrier.  The below example is
from ixgbe:

	if (!rx_desc->wb.upper.status_error)
		break;

	/* This memory barrier is needed to keep us from reading
	 * any other fields out of the rx_desc until we know the
	 * descriptor has been written back
	 */
	rmb();

With load_acquire() this can be changed to:

	if (!load_acquire(&rx_desc->wb.upper.status_error))
		break;

A similar change can be made in the release path of many drivers.  For
example in the Realtek r8169 driver there are a number of flows that
consist of something like the following:

	wmb();

	status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
	txd->opts1 = cpu_to_le32(status);

	tp->cur_tx += frags + 1;

	wmb();

With store_release() this can be changed to the following:

	status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
	store_release(&txd->opts1, cpu_to_le32(status));

	tp->cur_tx += frags + 1;

	wmb();

The resulting assembler code generated as a result can be significantly
less expensive on architectures such as x86 and s390 that support strong
ordering.  On architectures that are able to use different primitives than
their rmb/wmb() such as powerpc, ia64, and arm64 we should see gains as we
are able to use less expensive barriers, and for other architectures we end
up using a mb() which may come at the same amount of overhead or more than
a rmb/wmb() as we must ensure Load/Store ordering.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 arch/arm/include/asm/barrier.h      |   15 +++++++++
 arch/arm64/include/asm/barrier.h    |   59 ++++++++++++++++++-----------------
 arch/ia64/include/asm/barrier.h     |    7 +++-
 arch/metag/include/asm/barrier.h    |   15 +++++++++
 arch/mips/include/asm/barrier.h     |   15 +++++++++
 arch/powerpc/include/asm/barrier.h  |   24 +++++++++++---
 arch/s390/include/asm/barrier.h     |    7 +++-
 arch/sparc/include/asm/barrier_64.h |    6 ++--
 arch/x86/include/asm/barrier.h      |   22 ++++++++++++-
 include/asm-generic/barrier.h       |   15 +++++++++
 10 files changed, 144 insertions(+), 41 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Will Deacon Nov. 14, 2014, 10:19 a.m. UTC | #1
Hi Alex,

On Thu, Nov 13, 2014 at 07:27:23PM +0000, Alexander Duyck wrote:
> It is common for device drivers to make use of acquire/release semantics
> when dealing with descriptors stored in device memory.  On reviewing the
> documentation and code for smp_load_acquire() and smp_store_release() as
> well as reviewing an IBM website that goes over the use of PowerPC barriers
> at http://www.ibm.com/developerworks/systems/articles/powerpc.html it
> occurred to me that the same code could likely be applied to device drivers.
> 
> As a result this patch introduces load_acquire() and store_release().  The
> load_acquire() function can be used in the place of situations where a test
> for ownership must be followed by a memory barrier.  The below example is
> from ixgbe:
> 
>         if (!rx_desc->wb.upper.status_error)
>                 break;
> 
>         /* This memory barrier is needed to keep us from reading
>          * any other fields out of the rx_desc until we know the
>          * descriptor has been written back
>          */
>         rmb();
> 
> With load_acquire() this can be changed to:
> 
>         if (!load_acquire(&rx_desc->wb.upper.status_error))
>                 break;

I still don't think this is a good idea for the specific use-case you're
highlighting.

On ARM, an mb() can be *significantly* more expensive than an rmb() (since
we may have to drain store buffers on an outer L2 cache) and on arm64 it's
not at all clear that an LDAR is more efficient than an LDR; DMB LD
sequence. I can certainly imagine implementations where the latter would
be preferred.

So, whilst I'm perfectly fine to go along with mandatory acquire/release
macros (we should probably add a check to barf on __iomem pointers), I
don't agree with using them in preference to finer-grained read/write
barriers. Doing so will have a real impact on I/O performance.

Finally, do you know of any architectures where load_acquire/store_release
aren't implemented the same way as the smp_* variants on SMP kernels?

Will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Nov. 14, 2014, 10:45 a.m. UTC | #2
From: Alexander Duyck

> It is common for device drivers to make use of acquire/release semantics

> when dealing with descriptors stored in device memory.  On reviewing the

> documentation and code for smp_load_acquire() and smp_store_release() as

> well as reviewing an IBM website that goes over the use of PowerPC barriers

> at http://www.ibm.com/developerworks/systems/articles/powerpc.html it

> occurred to me that the same code could likely be applied to device drivers.

> 

> As a result this patch introduces load_acquire() and store_release().  The

> load_acquire() function can be used in the place of situations where a test

> for ownership must be followed by a memory barrier.  The below example is

> from ixgbe:

> 

> 	if (!rx_desc->wb.upper.status_error)

> 		break;

> 

> 	/* This memory barrier is needed to keep us from reading

> 	 * any other fields out of the rx_desc until we know the

> 	 * descriptor has been written back

> 	 */

> 	rmb();

> 

> With load_acquire() this can be changed to:

> 

> 	if (!load_acquire(&rx_desc->wb.upper.status_error))

> 		break;


If I'm quickly reading the 'new' code I need to look up yet another
function, with the 'old' code I can easily see the logic.

You've also added a memory barrier to the 'break' path - which isn't needed.

The driver might also have additional code that can be added before the barrier
so reducing the cost of the barrier.

The driver may also be able to perform multiple actions before a barrier is needed.

Hiding barriers isn't necessarily a good idea anyway.
If you are writing a driver you need to understand when and where they are needed.

Maybe you need a new (weaker) barrier to replace rmb() on some architectures.

...


	David
Alexander Duyck Nov. 14, 2014, 4 p.m. UTC | #3
On 11/14/2014 02:19 AM, Will Deacon wrote:
> Hi Alex,
>
> On Thu, Nov 13, 2014 at 07:27:23PM +0000, Alexander Duyck wrote:
>> It is common for device drivers to make use of acquire/release semantics
>> when dealing with descriptors stored in device memory.  On reviewing the
>> documentation and code for smp_load_acquire() and smp_store_release() as
>> well as reviewing an IBM website that goes over the use of PowerPC barriers
>> at http://www.ibm.com/developerworks/systems/articles/powerpc.html it
>> occurred to me that the same code could likely be applied to device drivers.
>>
>> As a result this patch introduces load_acquire() and store_release().  The
>> load_acquire() function can be used in the place of situations where a test
>> for ownership must be followed by a memory barrier.  The below example is
>> from ixgbe:
>>
>>          if (!rx_desc->wb.upper.status_error)
>>                  break;
>>
>>          /* This memory barrier is needed to keep us from reading
>>           * any other fields out of the rx_desc until we know the
>>           * descriptor has been written back
>>           */
>>          rmb();
>>
>> With load_acquire() this can be changed to:
>>
>>          if (!load_acquire(&rx_desc->wb.upper.status_error))
>>                  break;
> I still don't think this is a good idea for the specific use-case you're
> highlighting.
>
> On ARM, an mb() can be *significantly* more expensive than an rmb() (since
> we may have to drain store buffers on an outer L2 cache) and on arm64 it's
> not at all clear that an LDAR is more efficient than an LDR; DMB LD
> sequence. I can certainly imagine implementations where the latter would
> be preferred.

Yeah, I am pretty sure I overdid it in using a mb() for arm.  I think 
what I should probably be using is something like dmb(ish) which is used 
for smp_mb() instead.  The general idea is to enforce memory-memory 
accesses.  The memory-mmio accesses still should be using a full 
rmb()/wmb() barrier.

The alternative I am mulling over is creating something like a 
lightweight set of memory barriers named lw_mb(), lw_rmb(), lw_wmb(), 
that could be used instead.  The general idea is that on many 
architectures a full mb/rmb/wmb is far too much for just guaranteeing 
ordering for system memory only writes or reads.  I'm thinking I could 
probably use the smp_ varieties as a template for them since I'm 
thinking that in most cases this should be correct.

Also, just to be clear I am not advocating replacing the wmb() in most 
I/O setups where we have to sync the system memory before doing the MMIO 
write.  This is for the case where the device descriptor ring has some 
bit indicating ownership by either the device or the CPU.  So for 
example on the r8169 they have to do a wmb() before writing the DescOwn 
bit in the first descriptor of a given set of Tx descriptors to 
guarantee the rest are written, then they set the DescOwn bit, then they 
call wmb() again to flush that last bit before notifying the device it 
can start fetching the descriptors. My goal is to deal with that first 
wmb() and leave the second as it since it is correct.

> So, whilst I'm perfectly fine to go along with mandatory acquire/release
> macros (we should probably add a check to barf on __iomem pointers), I
> don't agree with using them in preference to finer-grained read/write
> barriers. Doing so will have a real impact on I/O performance.

Couldn't that type of check be added to compiletime_assert_atomic_type?  
That seems like that would be the best place for something like that.


> Finally, do you know of any architectures where load_acquire/store_release
> aren't implemented the same way as the smp_* variants on SMP kernels?
>
> Will

I should probably go back through and sort out the cases where mb() and 
smp_mb() are not the same thing.  I think I probably went with too harsh 
of a barrier in probably a couple of other cases.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Duyck Nov. 14, 2014, 4:58 p.m. UTC | #4
On 11/14/2014 02:45 AM, David Laight wrote:
> From: Alexander Duyck
>> It is common for device drivers to make use of acquire/release semantics
>> when dealing with descriptors stored in device memory.  On reviewing the
>> documentation and code for smp_load_acquire() and smp_store_release() as
>> well as reviewing an IBM website that goes over the use of PowerPC barriers
>> at http://www.ibm.com/developerworks/systems/articles/powerpc.html it
>> occurred to me that the same code could likely be applied to device drivers.
>>
>> As a result this patch introduces load_acquire() and store_release().  The
>> load_acquire() function can be used in the place of situations where a test
>> for ownership must be followed by a memory barrier.  The below example is
>> from ixgbe:
>>
>> 	if (!rx_desc->wb.upper.status_error)
>> 		break;
>>
>> 	/* This memory barrier is needed to keep us from reading
>> 	 * any other fields out of the rx_desc until we know the
>> 	 * descriptor has been written back
>> 	 */
>> 	rmb();
>>
>> With load_acquire() this can be changed to:
>>
>> 	if (!load_acquire(&rx_desc->wb.upper.status_error))
>> 		break;
> If I'm quickly reading the 'new' code I need to look up yet another
> function, with the 'old' code I can easily see the logic.
>
> You've also added a memory barrier to the 'break' path - which isn't needed.
>
> The driver might also have additional code that can be added before the barrier
> so reducing the cost of the barrier.
>
> The driver may also be able to perform multiple actions before a barrier is needed.
>
> Hiding barriers isn't necessarily a good idea anyway.
> If you are writing a driver you need to understand when and where they are needed.
>
> Maybe you need a new (weaker) barrier to replace rmb() on some architectures.
>
> ...
>
>
> 	David

Yeah, I think I might explore creating some lightweight barriers. The 
load/acquire stuff is a bit overkill for what is needed.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index c6a3e73..bbdcd34 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@ 
 #define smp_wmb()	dmb(ishst)
 #endif
 
+#define store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define load_acquire(p)							\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	___p1;								\
+})
+
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 6389d60..c91571c 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -32,33 +32,7 @@ 
 #define rmb()		dsb(ld)
 #define wmb()		dsb(st)
 
-#ifndef CONFIG_SMP
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-
-#define smp_store_release(p, v)						\
-do {									\
-	compiletime_assert_atomic_type(*p);				\
-	barrier();							\
-	ACCESS_ONCE(*p) = (v);						\
-} while (0)
-
-#define smp_load_acquire(p)						\
-({									\
-	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
-	compiletime_assert_atomic_type(*p);				\
-	barrier();							\
-	___p1;								\
-})
-
-#else
-
-#define smp_mb()	dmb(ish)
-#define smp_rmb()	dmb(ishld)
-#define smp_wmb()	dmb(ishst)
-
-#define smp_store_release(p, v)						\
+#define store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	switch (sizeof(*p)) {						\
@@ -73,7 +47,7 @@  do {									\
 	}								\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define load_acquire(p)							\
 ({									\
 	typeof(*p) ___p1;						\
 	compiletime_assert_atomic_type(*p);				\
@@ -90,6 +64,35 @@  do {									\
 	___p1;								\
 })
 
+#ifndef CONFIG_SMP
+#define smp_mb()	barrier()
+#define smp_rmb()	barrier()
+#define smp_wmb()	barrier()
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
+#else
+
+#define smp_mb()	dmb(ish)
+#define smp_rmb()	dmb(ishld)
+#define smp_wmb()	dmb(ishst)
+
+#define smp_store_release(p, v)	store_release(p, v)
+#define smp_load_acquire(p)	load_acquire(p)
+
 #endif
 
 #define read_barrier_depends()		do { } while(0)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index a48957c..d7fe208 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -63,14 +63,14 @@ 
  * need for asm trickery!
  */
 
-#define smp_store_release(p, v)						\
+#define store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	ACCESS_ONCE(*p) = (v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
@@ -78,6 +78,9 @@  do {									\
 	___p1;								\
 })
 
+#define smp_store_release(p, v)	store_release(p, v)
+#define smp_load_acquire(p)	load_acquire(p)
+
 /*
  * XXX check on this ---I suspect what Linus really wants here is
  * acquire vs release semantics but we can't discuss this stuff with
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index c7591e8..9beb687 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -85,6 +85,21 @@  static inline void fence(void)
 #define smp_read_barrier_depends()     do { } while (0)
 #define set_mb(var, value) do { var = value; smp_mb(); } while (0)
 
+#define store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define load_acquire(p)							\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	___p1;								\
+})
+
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index d0101dd..fc7323c 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,6 +180,21 @@ 
 #define nudge_writes() mb()
 #endif
 
+#define store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define load_acquire(p)							\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	___p1;								\
+})
+
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index bab79a1..f2a0d73 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -37,6 +37,23 @@ 
 
 #define set_mb(var, value)	do { var = value; mb(); } while (0)
 
+#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
+#define store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define load_acquire(p)							\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	___p1;								\
+})
+
 #ifdef CONFIG_SMP
 
 #ifdef __SUBARCH_HAS_LWSYNC
@@ -45,15 +62,12 @@ 
 #    define SMPWMB      eieio
 #endif
 
-#define __lwsync()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
 
 #define smp_mb()	mb()
 #define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else
-#define __lwsync()	barrier()
-
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
@@ -72,7 +86,7 @@ 
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
-	__lwsync();							\
+	smp_rmb();							\
 	ACCESS_ONCE(*p) = (v);						\
 } while (0)
 
@@ -80,7 +94,7 @@  do {									\
 ({									\
 	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-	__lwsync();							\
+	smp_rmb();							\
 	___p1;								\
 })
 
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index b5dce65..637d7a9 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -35,14 +35,14 @@ 
 
 #define set_mb(var, value)		do { var = value; mb(); } while (0)
 
-#define smp_store_release(p, v)						\
+#define store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	ACCESS_ONCE(*p) = (v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define load_acquire(p)							\
 ({									\
 	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
@@ -50,4 +50,7 @@  do {									\
 	___p1;								\
 })
 
+#define smp_store_release(p, v)		store_release(p, v)
+#define smp_load_acquire(p)		load_acquire(p)
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 305dcc3..7de3c69 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,14 +53,14 @@  do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 
 #define smp_read_barrier_depends()	do { } while(0)
 
-#define smp_store_release(p, v)						\
+#define store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	ACCESS_ONCE(*p) = (v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define load_acquire(p)							\
 ({									\
 	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
@@ -68,6 +68,8 @@  do {									\
 	___p1;								\
 })
 
+#define smp_store_release(p, v)	store_release(p, v)
+#define smp_load_acquire(p)	load_acquire(p)
 #define smp_mb__before_atomic()	barrier()
 #define smp_mb__after_atomic()	barrier()
 
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 0f4460b..3d2aa18 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -103,6 +103,21 @@ 
  * model and we should fall back to full barriers.
  */
 
+#define store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define load_acquire(p)							\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	___p1;								\
+})
+
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
@@ -120,14 +135,14 @@  do {									\
 
 #else /* regular x86 TSO memory ordering */
 
-#define smp_store_release(p, v)						\
+#define store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	ACCESS_ONCE(*p) = (v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define load_acquire(p)							\
 ({									\
 	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
@@ -135,6 +150,9 @@  do {									\
 	___p1;								\
 })
 
+#define smp_store_release(p, v)	store_release(p, v)
+#define smp_load_acquire(p)	load_acquire(p)
+
 #endif
 
 /* Atomic operations are already serializing on x86 */
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 1402fa8..c6e4b99 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -70,6 +70,21 @@ 
 #define smp_mb__after_atomic()	smp_mb()
 #endif
 
+#define store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define load_acquire(p)							\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	mb();								\
+	___p1;								\
+})
+
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\