diff mbox

[v3,6/6] powerpc: atomic: Implement cmpxchg{, 64}_* and atomic{, 64}_cmpxchg_* variants

Message ID 1444659246-24769-7-git-send-email-boqun.feng@gmail.com (mailing list archive)
State Superseded
Headers show

Commit Message

Boqun Feng Oct. 12, 2015, 2:14 p.m. UTC
Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
which _release variants can be built.

To avoid superfluous barriers in _acquire variants, we implement these
operations with assembly code rather use __atomic_op_acquire() to build
them automatically.

For the same reason, we keep the assembly implementation of fully
ordered cmpxchg operations.

Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not
compiler barriers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/atomic.h  |  10 +++
 arch/powerpc/include/asm/cmpxchg.h | 141 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 150 insertions(+), 1 deletion(-)

Comments

Will Deacon Oct. 13, 2015, 1:24 p.m. UTC | #1
On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote:
> Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
> which _release variants can be built.
> 
> To avoid superfluous barriers in _acquire variants, we implement these
> operations with assembly code rather use __atomic_op_acquire() to build
> them automatically.

The "superfluous barriers" are for the case where the cmpxchg fails, right?
And you don't do the same thing for release, because you want to avoid a
barrier in the middle of the critical section?

(just checking I understand your reasoning).

Will
Boqun Feng Oct. 13, 2015, 2:32 p.m. UTC | #2
On Tue, Oct 13, 2015 at 02:24:04PM +0100, Will Deacon wrote:
> On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote:
> > Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
> > which _release variants can be built.
> > 
> > To avoid superfluous barriers in _acquire variants, we implement these
> > operations with assembly code rather use __atomic_op_acquire() to build
> > them automatically.
> 
> The "superfluous barriers" are for the case where the cmpxchg fails, right?

Yes.

> And you don't do the same thing for release, because you want to avoid a
> barrier in the middle of the critical section?
> 

Mostly because of the comments in include/linux/atomic.h:

 * For compound atomics performing both a load and a store, ACQUIRE
 * semantics apply only to the load and RELEASE semantics only to the
 * store portion of the operation. Note that a failed cmpxchg_acquire
 * does -not- imply any memory ordering constraints.

so I thought only the barrier in cmpxchg_acquire() is conditional, and
the barrier in cmpxchg_release() is not. Maybe we'd better call it out
that cmpxchg *family* doesn't have any order guarantee if cmp fails, as
a complement of

ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")

Because it seems this commit only claims that the barriers in fully
ordered version are conditional.


If cmpxchg_release doesn't have order guarantee when failed, I guess I
can implement it with a barrier in the middle as you mentioned:

	unsigned int prev;

	__asm__ __volatile__ (
"1:	lwarx	%0,0,%2		
	cmpw	0,%0,%3\n\
	bne-	2f\n"
	PPC_RELEASE_BARRIER
"	stwcx.	%4,0,%2\n\
	bne-	1b"
	"\n\
2:"
	: "=&r" (prev), "+m" (*p)
	: "r" (p), "r" (old), "r" (new)
	: "cc", "memory");

	return prev;


However, I need to check whether the architecture allows this and any
other problem exists.

Besides, I don't think it's a good idea to do the "put barrier in the
middle" thing in this patchset, because that seems a premature
optimization and if we go further, I guess we can also replace the
PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered
version cmpxchg(). Too much needs to investigate then..

> (just checking I understand your reasoning).
> 

That actually helps me find a probably better implementation if allowed,
thank you ;-)

Regards,
Boqun
Will Deacon Oct. 13, 2015, 2:43 p.m. UTC | #3
On Tue, Oct 13, 2015 at 10:32:59PM +0800, Boqun Feng wrote:
> On Tue, Oct 13, 2015 at 02:24:04PM +0100, Will Deacon wrote:
> > On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote:
> > > Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
> > > which _release variants can be built.
> > > 
> > > To avoid superfluous barriers in _acquire variants, we implement these
> > > operations with assembly code rather use __atomic_op_acquire() to build
> > > them automatically.
> > 
> > The "superfluous barriers" are for the case where the cmpxchg fails, right?
> 
> Yes.
> 
> > And you don't do the same thing for release, because you want to avoid a
> > barrier in the middle of the critical section?
> > 
> 
> Mostly because of the comments in include/linux/atomic.h:
> 
>  * For compound atomics performing both a load and a store, ACQUIRE
>  * semantics apply only to the load and RELEASE semantics only to the
>  * store portion of the operation. Note that a failed cmpxchg_acquire
>  * does -not- imply any memory ordering constraints.
> 
> so I thought only the barrier in cmpxchg_acquire() is conditional, and
> the barrier in cmpxchg_release() is not. Maybe we'd better call it out
> that cmpxchg *family* doesn't have any order guarantee if cmp fails, as
> a complement of
> 
> ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> 
> Because it seems this commit only claims that the barriers in fully
> ordered version are conditional.

I didn't think this was ambiguous... A failed cmpxchg_release doesn't
perform a store, so because the RELEASE semantics only apply to the
store portion of the operation, it therefore doesn't have any ordering
guarantees. Acquire is called out as a special case because it *does*
actually perform a load on the failure case.

> If cmpxchg_release doesn't have order guarantee when failed, I guess I
> can implement it with a barrier in the middle as you mentioned:
> 
> 	unsigned int prev;
> 
> 	__asm__ __volatile__ (
> "1:	lwarx	%0,0,%2		
> 	cmpw	0,%0,%3\n\
> 	bne-	2f\n"
> 	PPC_RELEASE_BARRIER
> "	stwcx.	%4,0,%2\n\
> 	bne-	1b"
> 	"\n\
> 2:"
> 	: "=&r" (prev), "+m" (*p)
> 	: "r" (p), "r" (old), "r" (new)
> 	: "cc", "memory");
> 
> 	return prev;
> 
> 
> However, I need to check whether the architecture allows this and any
> other problem exists.
> 
> Besides, I don't think it's a good idea to do the "put barrier in the
> middle" thing in this patchset, because that seems a premature
> optimization and if we go further, I guess we can also replace the
> PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered
> version cmpxchg(). Too much needs to investigate then..

Putting a barrier in the middle of that critical section is probably a
terrible idea, and that's why I thought you were avoiding it (hence my
original question). Perhaps just add a comment to that effect, since I
fear adding more words to memory-barriers.txt is just likely to create
further confusion.

Will
Boqun Feng Oct. 13, 2015, 2:46 p.m. UTC | #4
On Tue, Oct 13, 2015 at 10:32:59PM +0800, Boqun Feng wrote:
> On Tue, Oct 13, 2015 at 02:24:04PM +0100, Will Deacon wrote:
> > On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote:
> > > Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
> > > which _release variants can be built.
> > > 
> > > To avoid superfluous barriers in _acquire variants, we implement these
> > > operations with assembly code rather use __atomic_op_acquire() to build
> > > them automatically.
> > 
> > The "superfluous barriers" are for the case where the cmpxchg fails, right?
> 
> Yes.
> 
> > And you don't do the same thing for release, because you want to avoid a
> > barrier in the middle of the critical section?
> > 
> 
> Mostly because of the comments in include/linux/atomic.h:
> 
>  * For compound atomics performing both a load and a store, ACQUIRE
>  * semantics apply only to the load and RELEASE semantics only to the
>  * store portion of the operation. Note that a failed cmpxchg_acquire
>  * does -not- imply any memory ordering constraints.
> 
> so I thought only the barrier in cmpxchg_acquire() is conditional, and
> the barrier in cmpxchg_release() is not. Maybe we'd better call it out
> that cmpxchg *family* doesn't have any order guarantee if cmp fails, as
> a complement of
> 
> ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> 
> Because it seems this commit only claims that the barriers in fully
> ordered version are conditional.
> 
> 
> If cmpxchg_release doesn't have order guarantee when failed, I guess I
> can implement it with a barrier in the middle as you mentioned:
> 
> 	unsigned int prev;
> 
> 	__asm__ __volatile__ (
> "1:	lwarx	%0,0,%2		
> 	cmpw	0,%0,%3\n\
> 	bne-	2f\n"
> 	PPC_RELEASE_BARRIER
> "	stwcx.	%4,0,%2\n\
> 	bne-	1b"
> 	"\n\
> 2:"
> 	: "=&r" (prev), "+m" (*p)
> 	: "r" (p), "r" (old), "r" (new)
> 	: "cc", "memory");
> 
> 	return prev;
> 
> 
> However, I need to check whether the architecture allows this and any
> other problem exists.
> 
> Besides, I don't think it's a good idea to do the "put barrier in the
> middle" thing in this patchset, because that seems a premature
> optimization and if we go further, I guess we can also replace the
> PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered

Correction: we can't just put the sync in the middle to implement a
fully ordered version. Sorry for that mistake..

Regards,
Boqun

> version cmpxchg(). Too much needs to investigate then..
> 
> > (just checking I understand your reasoning).
> > 
> 
> That actually helps me find a probably better implementation if allowed,
> thank you ;-)
> 
> Regards,
> Boqun
Boqun Feng Oct. 13, 2015, 2:58 p.m. UTC | #5
On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote:
> On Tue, Oct 13, 2015 at 10:32:59PM +0800, Boqun Feng wrote:
[snip]
> > 
> > Mostly because of the comments in include/linux/atomic.h:
> > 
> >  * For compound atomics performing both a load and a store, ACQUIRE
> >  * semantics apply only to the load and RELEASE semantics only to the
> >  * store portion of the operation. Note that a failed cmpxchg_acquire
> >  * does -not- imply any memory ordering constraints.
> > 
> > so I thought only the barrier in cmpxchg_acquire() is conditional, and
> > the barrier in cmpxchg_release() is not. Maybe we'd better call it out
> > that cmpxchg *family* doesn't have any order guarantee if cmp fails, as
> > a complement of
> > 
> > ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> > 
> > Because it seems this commit only claims that the barriers in fully
> > ordered version are conditional.
> 
> I didn't think this was ambiguous... A failed cmpxchg_release doesn't
> perform a store, so because the RELEASE semantics only apply to the
> store portion of the operation, it therefore doesn't have any ordering
> guarantees. Acquire is called out as a special case because it *does*
> actually perform a load on the failure case.
> 

Make sense.

> > If cmpxchg_release doesn't have order guarantee when failed, I guess I
> > can implement it with a barrier in the middle as you mentioned:
> > 
> > 	unsigned int prev;
> > 
> > 	__asm__ __volatile__ (
> > "1:	lwarx	%0,0,%2		
> > 	cmpw	0,%0,%3\n\
> > 	bne-	2f\n"
> > 	PPC_RELEASE_BARRIER
> > "	stwcx.	%4,0,%2\n\
> > 	bne-	1b"
> > 	"\n\
> > 2:"
> > 	: "=&r" (prev), "+m" (*p)
> > 	: "r" (p), "r" (old), "r" (new)
> > 	: "cc", "memory");
> > 
> > 	return prev;
> > 
> > 
> > However, I need to check whether the architecture allows this and any
> > other problem exists.
> > 
> > Besides, I don't think it's a good idea to do the "put barrier in the
> > middle" thing in this patchset, because that seems a premature
> > optimization and if we go further, I guess we can also replace the
> > PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered
> > version cmpxchg(). Too much needs to investigate then..
> 
> Putting a barrier in the middle of that critical section is probably a
> terrible idea, and that's why I thought you were avoiding it (hence my

The fact is that I haven't thought of that way to implement
cmpxchg_release before you ask that question ;-) And I'm not going to do
that for now and probably not in the future.

> original question). Perhaps just add a comment to that effect, since I

Are you suggesting if I put a barrier in the middle I'd better to add a
comment, right? So if I don't do that, it's OK to let this patch as it.

Regards,
Boqun

> fear adding more words to memory-barriers.txt is just likely to create
> further confusion.
> 
> Will
Will Deacon Oct. 13, 2015, 3:04 p.m. UTC | #6
On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote:
> On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote:
> > Putting a barrier in the middle of that critical section is probably a
> > terrible idea, and that's why I thought you were avoiding it (hence my
> 
> The fact is that I haven't thought of that way to implement
> cmpxchg_release before you ask that question ;-) And I'm not going to do
> that for now and probably not in the future.
> 
> > original question). Perhaps just add a comment to that effect, since I
> 
> Are you suggesting if I put a barrier in the middle I'd better to add a
> comment, right? So if I don't do that, it's OK to let this patch as it.

No, I mean put a comment in your file to explain the reason why you
override _relaxed and _acquire, but not _release (because overriding
_release would introduce this weird barrier in the middle of the critical
section, which would likely cause the conditional store to fail).

Will
Boqun Feng Oct. 13, 2015, 3:45 p.m. UTC | #7
On Tue, Oct 13, 2015 at 04:04:27PM +0100, Will Deacon wrote:
> On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote:
> > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote:
> > > Putting a barrier in the middle of that critical section is probably a
> > > terrible idea, and that's why I thought you were avoiding it (hence my
> > 
> > The fact is that I haven't thought of that way to implement
> > cmpxchg_release before you ask that question ;-) And I'm not going to do
> > that for now and probably not in the future.
> > 
> > > original question). Perhaps just add a comment to that effect, since I
> > 
> > Are you suggesting if I put a barrier in the middle I'd better to add a
> > comment, right? So if I don't do that, it's OK to let this patch as it.
> 
> No, I mean put a comment in your file to explain the reason why you
> override _relaxed and _acquire, but not _release (because overriding
> _release would introduce this weird barrier in the middle of the critical
> section, which would likely cause the conditional store to fail).
> 

Good idea, will do that. Thank you ;-)

Regards,
Boqun
Boqun Feng Oct. 14, 2015, 1:47 a.m. UTC | #8
On Tue, Oct 13, 2015 at 04:04:27PM +0100, Will Deacon wrote:
> On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote:
> > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote:
> > > Putting a barrier in the middle of that critical section is probably a
> > > terrible idea, and that's why I thought you were avoiding it (hence my
> > 
> > The fact is that I haven't thought of that way to implement
> > cmpxchg_release before you ask that question ;-) And I'm not going to do
> > that for now and probably not in the future.
> > 
> > > original question). Perhaps just add a comment to that effect, since I
> > 
> > Are you suggesting if I put a barrier in the middle I'd better to add a
> > comment, right? So if I don't do that, it's OK to let this patch as it.
> 
> No, I mean put a comment in your file to explain the reason why you
> override _relaxed and _acquire, but not _release (because overriding

You mean overriding _acquire and fully order version, right?

> _release would introduce this weird barrier in the middle of the critical
> section, which would likely cause the conditional store to fail).
> 
> Will
Will Deacon Oct. 14, 2015, 9:40 a.m. UTC | #9
On Wed, Oct 14, 2015 at 09:47:35AM +0800, Boqun Feng wrote:
> On Tue, Oct 13, 2015 at 04:04:27PM +0100, Will Deacon wrote:
> > On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote:
> > > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote:
> > > > Putting a barrier in the middle of that critical section is probably a
> > > > terrible idea, and that's why I thought you were avoiding it (hence my
> > > 
> > > The fact is that I haven't thought of that way to implement
> > > cmpxchg_release before you ask that question ;-) And I'm not going to do
> > > that for now and probably not in the future.
> > > 
> > > > original question). Perhaps just add a comment to that effect, since I
> > > 
> > > Are you suggesting if I put a barrier in the middle I'd better to add a
> > > comment, right? So if I don't do that, it's OK to let this patch as it.
> > 
> > No, I mean put a comment in your file to explain the reason why you
> > override _relaxed and _acquire, but not _release (because overriding
> 
> You mean overriding _acquire and fully order version, right?

Yes, my mistake. Sounds like you get my drift, though.

Will
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index 8869fb3..9f44515 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -191,6 +191,11 @@  static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
 #define atomic_dec_return_relaxed atomic_dec_return_relaxed
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic_cmpxchg_relaxed(v, o, n) \
+	cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic_cmpxchg_acquire(v, o, n) \
+	cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
@@ -459,6 +464,11 @@  static __inline__ long atomic64_dec_if_positive(atomic64_t *v)
 }
 
 #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic64_cmpxchg_relaxed(v, o, n) \
+	cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic64_cmpxchg_acquire(v, o, n) \
+	cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index 17c7e14..fd4ea04 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -181,6 +181,48 @@  __cmpxchg_u32_local(volatile unsigned int *p, unsigned long old,
 	return prev;
 }
 
+static __always_inline unsigned long
+__cmpxchg_u32_relaxed(u32 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%2		# __cmpxchg_u32_relaxed\n"
+"	cmpw	0,%0,%3\n"
+"	bne-	2f\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%4,0,%2\n"
+"	bne-	1b\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc");
+
+	return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%2		# __cmpxchg_u32_acquire\n"
+"	cmpw	0,%0,%3\n"
+"	bne-	2f\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%4,0,%2\n"
+"	bne-	1b\n"
+	PPC_ACQUIRE_BARRIER
+	"\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
+
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
 __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
@@ -224,6 +266,46 @@  __cmpxchg_u64_local(volatile unsigned long *p, unsigned long old,
 
 	return prev;
 }
+
+static __always_inline unsigned long
+__cmpxchg_u64_relaxed(u64 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%2		# __cmpxchg_u64_relaxed\n"
+"	cmpd	0,%0,%3\n"
+"	bne-	2f\n"
+"	stdcx.	%4,0,%2\n"
+"	bne-	1b\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc");
+
+	return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%2		# __cmpxchg_u64_acquire\n"
+"	cmpd	0,%0,%3\n"
+"	bne-	2f\n"
+"	stdcx.	%4,0,%2\n"
+"	bne-	1b\n"
+	PPC_ACQUIRE_BARRIER
+	"\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
 #endif
 
 /* This function doesn't exist, so you'll get a linker error
@@ -262,6 +344,37 @@  __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 	return old;
 }
 
+static __always_inline unsigned long
+__cmpxchg_relaxed(void *ptr, unsigned long old, unsigned long new,
+		  unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_relaxed(ptr, old, new);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __cmpxchg_u64_relaxed(ptr, old, new);
+#endif
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
+
+static __always_inline unsigned long
+__cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
+		  unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_acquire(ptr, old, new);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __cmpxchg_u64_acquire(ptr, old, new);
+#endif
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
 #define cmpxchg(ptr, o, n)						 \
   ({									 \
      __typeof__(*(ptr)) _o_ = (o);					 \
@@ -279,6 +392,23 @@  __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 				    (unsigned long)_n_, sizeof(*(ptr))); \
   })
 
+#define cmpxchg_relaxed(ptr, o, n)					\
+({									\
+	__typeof__(*(ptr)) _o_ = (o);					\
+	__typeof__(*(ptr)) _n_ = (n);					\
+	(__typeof__(*(ptr))) __cmpxchg_relaxed((ptr),			\
+			(unsigned long)_o_, (unsigned long)_n_,		\
+			sizeof(*(ptr)));				\
+})
+
+#define cmpxchg_acquire(ptr, o, n)					\
+({									\
+	__typeof__(*(ptr)) _o_ = (o);					\
+	__typeof__(*(ptr)) _n_ = (n);					\
+	(__typeof__(*(ptr))) __cmpxchg_acquire((ptr),			\
+			(unsigned long)_o_, (unsigned long)_n_,		\
+			sizeof(*(ptr)));				\
+})
 #ifdef CONFIG_PPC64
 #define cmpxchg64(ptr, o, n)						\
   ({									\
@@ -290,7 +420,16 @@  __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
 	cmpxchg_local((ptr), (o), (n));					\
   })
-#define cmpxchg64_relaxed	cmpxchg64_local
+#define cmpxchg64_relaxed(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	cmpxchg_relaxed((ptr), (o), (n));				\
+})
+#define cmpxchg64_acquire(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	cmpxchg_acquire((ptr), (o), (n));				\
+})
 #else
 #include <asm-generic/cmpxchg-local.h>
 #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))