diff mbox

[RFC,v2,5/7] powerpc: atomic: Implement cmpxchg{, 64}_* and atomic{, 64}_cmpxchg_* variants

Message ID 1442418575-12297-6-git-send-email-boqun.feng@gmail.com (mailing list archive)
State Superseded
Headers show

Commit Message

Boqun Feng Sept. 16, 2015, 3:49 p.m. UTC
Unlike other atomic operation variants, cmpxchg{,64}_acquire and
atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
fails, so we need to implement these using assembly.

Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not
compiler barriers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/atomic.h  |  10 +++
 arch/powerpc/include/asm/cmpxchg.h | 141 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 150 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Oct. 1, 2015, 12:27 p.m. UTC | #1
On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> fails, so we need to implement these using assembly.

I think that is actually expected and documented. That is, a cmpxchg
only implies barriers on success. See:

  ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
Peter Zijlstra Oct. 1, 2015, 12:36 p.m. UTC | #2
On Thu, Oct 01, 2015 at 02:27:15PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > fails, so we need to implement these using assembly.
> 
> I think that is actually expected and documented. That is, a cmpxchg
> only implies barriers on success. See:
> 
>   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")

Also:

654672d4ba1a6 (Will Deacon     2015-08-06 17:54:37 +0100  28)  * store portion of the operation. Note that a failed cmpxchg_acquire
654672d4ba1a6 (Will Deacon     2015-08-06 17:54:37 +0100  29)  * does -not- imply any memory ordering constraints.
Paul E. McKenney Oct. 1, 2015, 3:12 p.m. UTC | #3
On Thu, Oct 01, 2015 at 02:36:26PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 01, 2015 at 02:27:15PM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > > fails, so we need to implement these using assembly.
> > 
> > I think that is actually expected and documented. That is, a cmpxchg
> > only implies barriers on success. See:
> > 
> >   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> 
> Also:
> 
> 654672d4ba1a6 (Will Deacon     2015-08-06 17:54:37 +0100  28)  * store portion of the operation. Note that a failed cmpxchg_acquire
> 654672d4ba1a6 (Will Deacon     2015-08-06 17:54:37 +0100  29)  * does -not- imply any memory ordering constraints.

What C11 does is to allow the developer to specify different orderings
on success and failure.  But it is no harder to supply a barrier (if
needed) on the failure path, right?

							Thanx, Paul
Paul E. McKenney Oct. 1, 2015, 3:13 p.m. UTC | #4
On Thu, Oct 01, 2015 at 02:36:26PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 01, 2015 at 02:27:15PM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > > fails, so we need to implement these using assembly.
> > 
> > I think that is actually expected and documented. That is, a cmpxchg
> > only implies barriers on success. See:
> > 
> >   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> 
> Also:
> 
> 654672d4ba1a6 (Will Deacon     2015-08-06 17:54:37 +0100  28)  * store portion of the operation. Note that a failed cmpxchg_acquire
> 654672d4ba1a6 (Will Deacon     2015-08-06 17:54:37 +0100  29)  * does -not- imply any memory ordering constraints.

Agreed, no need for ordering on failed cmpxchg.

							Thanx, Paul
Peter Zijlstra Oct. 1, 2015, 5:11 p.m. UTC | #5
On Thu, Oct 01, 2015 at 08:12:19AM -0700, Paul E. McKenney wrote:

> What C11 does is to allow the developer to specify different orderings
> on success and failure.  But it is no harder to supply a barrier (if
> needed) on the failure path, right?

Quite right.
Boqun Feng Oct. 10, 2015, 1:58 a.m. UTC | #6
Hi Peter,

Sorry for replying late.

On Thu, Oct 01, 2015 at 02:27:16PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > fails, so we need to implement these using assembly.
> 
> I think that is actually expected and documented. That is, a cmpxchg
> only implies barriers on success. See:
> 
>   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")

I probably didn't make myself clear here, my point is that if we use
__atomic_op_acquire() to built *_cmpchg_acquire(For ARM and PowerPC),
the barrier will be implied _unconditionally_, meaning no matter cmp
fails or not, there will be a barrier after the cmpxchg operation.
Therefore we have to use assembly to implement the operations right now.

Regards,
Boqun
Boqun Feng Oct. 11, 2015, 10:25 a.m. UTC | #7
On Sat, Oct 10, 2015 at 09:58:05AM +0800, Boqun Feng wrote:
> Hi Peter,
> 
> Sorry for replying late.
> 
> On Thu, Oct 01, 2015 at 02:27:16PM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > > fails, so we need to implement these using assembly.
> > 
> > I think that is actually expected and documented. That is, a cmpxchg
> > only implies barriers on success. See:
> > 
> >   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> 
> I probably didn't make myself clear here, my point is that if we use
> __atomic_op_acquire() to built *_cmpchg_acquire(For ARM and PowerPC),
> the barrier will be implied _unconditionally_, meaning no matter cmp
> fails or not, there will be a barrier after the cmpxchg operation.
> Therefore we have to use assembly to implement the operations right now.
> 

Or let me try another way to explain this. What I wanted to say here is
that unlike the implementation of xchg family, which needs only to
implement _relaxed version and *remove* the fully ordered version, the
implementation of cmpxchg family needs to *remain* the fully ordered
version and implement the _acquire version in assembly. Because if we
use __atomic_op_*(), the barriers in the cmpxchg family will be implied
*unconditionally*, for example:

cmpxchg() on PPC will be(built by my version of __atomic_op_fence()):

	smp_lwsync();
	cmpxchg_relaxed(...);
	smp_mb__after_atomic(); // a full barrier regardless of success
				// or failure.

In order to have a conditional barrier, we need a way to jump out of a
ll/sc loop, which could only(?) be done by assembly code.

My commit log surely failed to explain this clearly, I will modifiy that
in next series. In the meanwhile, looking forwards to suggestion on the
implementation of cmpxchg familiy ;-)

BTW, Will, could you please check whether the barriers in cmpxchg family
are unconditional or not in the current implementation of ARM? IIUC,
they are currently unconditional, right?

Regards,
Boqun
Peter Zijlstra Oct. 12, 2015, 6:46 a.m. UTC | #8
On Sun, Oct 11, 2015 at 06:25:20PM +0800, Boqun Feng wrote:
> On Sat, Oct 10, 2015 at 09:58:05AM +0800, Boqun Feng wrote:
> > Hi Peter,
> > 
> > Sorry for replying late.
> > 
> > On Thu, Oct 01, 2015 at 02:27:16PM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > > > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > > > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > > > fails, so we need to implement these using assembly.
> > > 
> > > I think that is actually expected and documented. That is, a cmpxchg
> > > only implies barriers on success. See:
> > > 
> > >   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> > 
> > I probably didn't make myself clear here, my point is that if we use
> > __atomic_op_acquire() to built *_cmpchg_acquire(For ARM and PowerPC),
> > the barrier will be implied _unconditionally_, meaning no matter cmp
> > fails or not, there will be a barrier after the cmpxchg operation.
> > Therefore we have to use assembly to implement the operations right now.

See later, but no, you don't _have_ to.

> Or let me try another way to explain this. What I wanted to say here is
> that unlike the implementation of xchg family, which needs only to
> implement _relaxed version and *remove* the fully ordered version, the
> implementation of cmpxchg family needs to *remain* the fully ordered
> version and implement the _acquire version in assembly. Because if we
> use __atomic_op_*(), the barriers in the cmpxchg family will be implied
> *unconditionally*, for example:

So the point that confused me, and which is still valid for the above,
is your use of 'need'.

You don't need to omit the barrier at all. Its perfectly valid to issue
too many barriers (pointless and a waste of time, yes; incorrect, no).

So what you want to say is: "Optimize cmpxchg_acquire() to avoid
superfluous barrier".
Boqun Feng Oct. 12, 2015, 7:03 a.m. UTC | #9
On Mon, Oct 12, 2015 at 08:46:21AM +0200, Peter Zijlstra wrote:
> On Sun, Oct 11, 2015 at 06:25:20PM +0800, Boqun Feng wrote:
> > On Sat, Oct 10, 2015 at 09:58:05AM +0800, Boqun Feng wrote:
> > > Hi Peter,
> > > 
> > > Sorry for replying late.
> > > 
> > > On Thu, Oct 01, 2015 at 02:27:16PM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > > > > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > > > > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > > > > fails, so we need to implement these using assembly.
> > > > 
> > > > I think that is actually expected and documented. That is, a cmpxchg
> > > > only implies barriers on success. See:
> > > > 
> > > >   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")
> > > 
> > > I probably didn't make myself clear here, my point is that if we use
> > > __atomic_op_acquire() to built *_cmpchg_acquire(For ARM and PowerPC),
> > > the barrier will be implied _unconditionally_, meaning no matter cmp
> > > fails or not, there will be a barrier after the cmpxchg operation.
> > > Therefore we have to use assembly to implement the operations right now.
> 
> See later, but no, you don't _have_ to.
> 
> > Or let me try another way to explain this. What I wanted to say here is
> > that unlike the implementation of xchg family, which needs only to
> > implement _relaxed version and *remove* the fully ordered version, the
> > implementation of cmpxchg family needs to *remain* the fully ordered
> > version and implement the _acquire version in assembly. Because if we
> > use __atomic_op_*(), the barriers in the cmpxchg family will be implied
> > *unconditionally*, for example:
> 
> So the point that confused me, and which is still valid for the above,
> is your use of 'need'.
> 

Indeed, my apologies for confusing more..

> You don't need to omit the barrier at all. Its perfectly valid to issue
> too many barriers (pointless and a waste of time, yes; incorrect, no).
> 

Agreed.

> So what you want to say is: "Optimize cmpxchg_acquire() to avoid
> superfluous barrier".

Yes. I would like to implement a cmpxchg_acquire that really has no
barrier if cmp failed, which became my 'need' for the implementation.
But you are right, we don't have to omit the barrier. I will modify the
commit log to explain this clear in the next version. Thank you!

Regards,
Boqun
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index d9f570b..0608e39 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -192,6 +192,11 @@  static __inline__ int atomic_dec_return(atomic_t *v)
 }
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic_cmpxchg_relaxed(v, o, n) \
+	cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic_cmpxchg_acquire(v, o, n) \
+	cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
@@ -461,6 +466,11 @@  static __inline__ long atomic64_dec_if_positive(atomic64_t *v)
 }
 
 #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic64_cmpxchg_relaxed(v, o, n) \
+	cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic64_cmpxchg_acquire(v, o, n) \
+	cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index 66374f4..f40f295 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -254,6 +254,48 @@  __cmpxchg_u32_local(volatile unsigned int *p, unsigned long old,
 	return prev;
 }
 
+static __always_inline unsigned long
+__cmpxchg_u32_relaxed(u32 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%2		# __cmpxchg_u32_relaxed\n"
+"	cmpw	0,%0,%3\n"
+"	bne-	2f\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%4,0,%2\n"
+"	bne-	1b\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc");
+
+	return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%2		# __cmpxchg_u32_acquire\n"
+"	cmpw	0,%0,%3\n"
+"	bne-	2f\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%4,0,%2\n"
+"	bne-	1b\n"
+	PPC_ACQUIRE_BARRIER
+	"\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
+
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
 __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
@@ -297,6 +339,46 @@  __cmpxchg_u64_local(volatile unsigned long *p, unsigned long old,
 
 	return prev;
 }
+
+static __always_inline unsigned long
+__cmpxchg_u64_relaxed(u64 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%2		# __cmpxchg_u64_relaxed\n"
+"	cmpd	0,%0,%3\n"
+"	bne-	2f\n"
+"	stdcx.	%4,0,%2\n"
+"	bne-	1b\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc");
+
+	return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%2		# __cmpxchg_u64_acquire\n"
+"	cmpd	0,%0,%3\n"
+"	bne-	2f\n"
+"	stdcx.	%4,0,%2\n"
+"	bne-	1b\n"
+	PPC_ACQUIRE_BARRIER
+	"\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
 #endif
 
 /* This function doesn't exist, so you'll get a linker error
@@ -335,6 +417,37 @@  __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 	return old;
 }
 
+static __always_inline unsigned long
+__cmpxchg_relaxed(void *ptr, unsigned long old, unsigned long new,
+		  unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_relaxed(ptr, old, new);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __cmpxchg_u64_relaxed(ptr, old, new);
+#endif
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
+
+static __always_inline unsigned long
+__cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
+		  unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_acquire(ptr, old, new);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __cmpxchg_u64_acquire(ptr, old, new);
+#endif
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
 #define cmpxchg(ptr, o, n)						 \
   ({									 \
      __typeof__(*(ptr)) _o_ = (o);					 \
@@ -352,6 +465,23 @@  __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 				    (unsigned long)_n_, sizeof(*(ptr))); \
   })
 
+#define cmpxchg_relaxed(ptr, o, n)					\
+({									\
+	__typeof__(*(ptr)) _o_ = (o);					\
+	__typeof__(*(ptr)) _n_ = (n);					\
+	(__typeof__(*(ptr))) __cmpxchg_relaxed((ptr),			\
+			(unsigned long)_o_, (unsigned long)_n_,		\
+			sizeof(*(ptr)));				\
+})
+
+#define cmpxchg_acquire(ptr, o, n)					\
+({									\
+	__typeof__(*(ptr)) _o_ = (o);					\
+	__typeof__(*(ptr)) _n_ = (n);					\
+	(__typeof__(*(ptr))) __cmpxchg_acquire((ptr),			\
+			(unsigned long)_o_, (unsigned long)_n_,		\
+			sizeof(*(ptr)));				\
+})
 #ifdef CONFIG_PPC64
 #define cmpxchg64(ptr, o, n)						\
   ({									\
@@ -363,7 +493,16 @@  __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
 	cmpxchg_local((ptr), (o), (n));					\
   })
-#define cmpxchg64_relaxed	cmpxchg64_local
+#define cmpxchg64_relaxed(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	cmpxchg_relaxed((ptr), (o), (n));				\
+})
+#define cmpxchg64_acquire(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	cmpxchg_acquire((ptr), (o), (n));				\
+})
 #else
 #include <asm-generic/cmpxchg-local.h>
 #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))