diff mbox

[PATCHv5,2/2] memory barrier: adding smp_mb__after_lock

Message ID 20090703081445.GG2902@jolsa.lab.eng.brq.redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Olsa July 3, 2009, 8:14 a.m. UTC
Adding smp_mb__after_lock define to be used as a smp_mb call after
a lock.  

Making it nop for x86, since {read|write|spin}_lock() on x86 are 
full memory barriers.

wbr,
jirka


Signed-off-by: Jiri Olsa <jolsa@redhat.com>

---
 arch/x86/include/asm/spinlock.h |    3 +++
 include/linux/spinlock.h        |    5 +++++
 include/net/sock.h              |    5 ++++-
 3 files changed, 12 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ingo Molnar July 3, 2009, 9:06 a.m. UTC | #1
* Jiri Olsa <jolsa@redhat.com> wrote:

> +++ b/arch/x86/include/asm/spinlock.h
> @@ -302,4 +302,7 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
>  #define _raw_read_relax(lock)	cpu_relax()
>  #define _raw_write_relax(lock)	cpu_relax()
>  
> +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
> +#define smp_mb__after_lock() do { } while (0)

Two small stylistic comments, please make this an inline function:

static inline void smp_mb__after_lock(void) { }
#define smp_mb__after_lock

(untested)

> +/* The lock does not imply full memory barrier. */
> +#ifndef smp_mb__after_lock
> +#define smp_mb__after_lock() smp_mb()
> +#endif

ditto.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 3, 2009, 9:20 a.m. UTC | #2
Ingo Molnar a écrit :
> * Jiri Olsa <jolsa@redhat.com> wrote:
> 
>> +++ b/arch/x86/include/asm/spinlock.h
>> @@ -302,4 +302,7 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
>>  #define _raw_read_relax(lock)	cpu_relax()
>>  #define _raw_write_relax(lock)	cpu_relax()
>>  
>> +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
>> +#define smp_mb__after_lock() do { } while (0)
> 
> Two small stylistic comments, please make this an inline function:
> 
> static inline void smp_mb__after_lock(void) { }
> #define smp_mb__after_lock
> 
> (untested)
> 
>> +/* The lock does not imply full memory barrier. */
>> +#ifndef smp_mb__after_lock
>> +#define smp_mb__after_lock() smp_mb()
>> +#endif
> 
> ditto.
> 
> 	Ingo

This was following existing implementations of various smp_mb__??? helpers :

# grep -4 smp_mb__before_clear_bit include/asm-generic/bitops.h

/*
 * clear_bit may not imply a memory barrier
 */
#ifndef smp_mb__before_clear_bit
#define smp_mb__before_clear_bit()      smp_mb()
#define smp_mb__after_clear_bit()       smp_mb()
#endif


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar July 3, 2009, 9:24 a.m. UTC | #3
* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Ingo Molnar a écrit :
> > * Jiri Olsa <jolsa@redhat.com> wrote:
> > 
> >> +++ b/arch/x86/include/asm/spinlock.h
> >> @@ -302,4 +302,7 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
> >>  #define _raw_read_relax(lock)	cpu_relax()
> >>  #define _raw_write_relax(lock)	cpu_relax()
> >>  
> >> +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
> >> +#define smp_mb__after_lock() do { } while (0)
> > 
> > Two small stylistic comments, please make this an inline function:
> > 
> > static inline void smp_mb__after_lock(void) { }
> > #define smp_mb__after_lock
> > 
> > (untested)
> > 
> >> +/* The lock does not imply full memory barrier. */
> >> +#ifndef smp_mb__after_lock
> >> +#define smp_mb__after_lock() smp_mb()
> >> +#endif
> > 
> > ditto.
> > 
> > 	Ingo
> 
> This was following existing implementations of various smp_mb__??? helpers :
> 
> # grep -4 smp_mb__before_clear_bit include/asm-generic/bitops.h
> 
> /*
>  * clear_bit may not imply a memory barrier
>  */
> #ifndef smp_mb__before_clear_bit
> #define smp_mb__before_clear_bit()      smp_mb()
> #define smp_mb__after_clear_bit()       smp_mb()
> #endif

Did i mention that those should be fixed too? :-)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Olsa July 3, 2009, 9:56 a.m. UTC | #4
On Fri, Jul 03, 2009 at 11:24:38AM +0200, Ingo Molnar wrote:
> 
> * Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Ingo Molnar a écrit :
> > > * Jiri Olsa <jolsa@redhat.com> wrote:
> > > 
> > >> +++ b/arch/x86/include/asm/spinlock.h
> > >> @@ -302,4 +302,7 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
> > >>  #define _raw_read_relax(lock)	cpu_relax()
> > >>  #define _raw_write_relax(lock)	cpu_relax()
> > >>  
> > >> +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
> > >> +#define smp_mb__after_lock() do { } while (0)
> > > 
> > > Two small stylistic comments, please make this an inline function:
> > > 
> > > static inline void smp_mb__after_lock(void) { }
> > > #define smp_mb__after_lock
> > > 
> > > (untested)
> > > 
> > >> +/* The lock does not imply full memory barrier. */
> > >> +#ifndef smp_mb__after_lock
> > >> +#define smp_mb__after_lock() smp_mb()
> > >> +#endif
> > > 
> > > ditto.
> > > 
> > > 	Ingo
> > 
> > This was following existing implementations of various smp_mb__??? helpers :
> > 
> > # grep -4 smp_mb__before_clear_bit include/asm-generic/bitops.h
> > 
> > /*
> >  * clear_bit may not imply a memory barrier
> >  */
> > #ifndef smp_mb__before_clear_bit
> > #define smp_mb__before_clear_bit()      smp_mb()
> > #define smp_mb__after_clear_bit()       smp_mb()
> > #endif
> 
> Did i mention that those should be fixed too? :-)
> 
> 	Ingo

ok, could I include it in the 2/2 or you prefer separate patch?

jirka
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar July 3, 2009, 10:25 a.m. UTC | #5
* Jiri Olsa <jolsa@redhat.com> wrote:

> On Fri, Jul 03, 2009 at 11:24:38AM +0200, Ingo Molnar wrote:
> > 
> > * Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > > Ingo Molnar a écrit :
> > > > * Jiri Olsa <jolsa@redhat.com> wrote:
> > > > 
> > > >> +++ b/arch/x86/include/asm/spinlock.h
> > > >> @@ -302,4 +302,7 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
> > > >>  #define _raw_read_relax(lock)	cpu_relax()
> > > >>  #define _raw_write_relax(lock)	cpu_relax()
> > > >>  
> > > >> +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
> > > >> +#define smp_mb__after_lock() do { } while (0)
> > > > 
> > > > Two small stylistic comments, please make this an inline function:
> > > > 
> > > > static inline void smp_mb__after_lock(void) { }
> > > > #define smp_mb__after_lock
> > > > 
> > > > (untested)
> > > > 
> > > >> +/* The lock does not imply full memory barrier. */
> > > >> +#ifndef smp_mb__after_lock
> > > >> +#define smp_mb__after_lock() smp_mb()
> > > >> +#endif
> > > > 
> > > > ditto.
> > > > 
> > > > 	Ingo
> > > 
> > > This was following existing implementations of various smp_mb__??? helpers :
> > > 
> > > # grep -4 smp_mb__before_clear_bit include/asm-generic/bitops.h
> > > 
> > > /*
> > >  * clear_bit may not imply a memory barrier
> > >  */
> > > #ifndef smp_mb__before_clear_bit
> > > #define smp_mb__before_clear_bit()      smp_mb()
> > > #define smp_mb__after_clear_bit()       smp_mb()
> > > #endif
> > 
> > Did i mention that those should be fixed too? :-)
> > 
> > 	Ingo
> 
> ok, could I include it in the 2/2 or you prefer separate patch?

depends on whether it will regress ;-)

If it regresses, it's better to have it separate. If it wont, it can 
be included. If unsure, default to the more conservative option.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers July 3, 2009, 2:04 p.m. UTC | #6
* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Jiri Olsa <jolsa@redhat.com> wrote:
> 
> > +++ b/arch/x86/include/asm/spinlock.h
> > @@ -302,4 +302,7 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
> >  #define _raw_read_relax(lock)	cpu_relax()
> >  #define _raw_write_relax(lock)	cpu_relax()
> >  
> > +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
> > +#define smp_mb__after_lock() do { } while (0)
> 

Hm. Looking at http://lkml.org/lkml/2009/6/23/192, a very basic question
comes to my mind :

Why don't we create a read_lock without acquire semantic instead (e.g.
read_lock_nomb(), or something with a better name like __read_lock()) ?
On architectures where memory barriers are needed to provide the acquire
semantic, it would be faster to do :

__read_lock();
smp_mb();

than :

read_lock(); <- e.g. lwsync + isync or something like that
smp_mb(); <- full sync.

Second point : __add_wait_queue/waitqueue_active/wake_up_interruptible
would probably benefit from adding comments about their combined use
with other checks and how nice memory barriers are.

Mathieu


> Two small stylistic comments, please make this an inline function:
> 
> static inline void smp_mb__after_lock(void) { }
> #define smp_mb__after_lock
> 
> (untested)
> 
> > +/* The lock does not imply full memory barrier. */
> > +#ifndef smp_mb__after_lock
> > +#define smp_mb__after_lock() smp_mb()
> > +#endif
> 
> ditto.
> 
> 	Ingo
Herbert Xu July 3, 2009, 3:29 p.m. UTC | #7
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> Why don't we create a read_lock without acquire semantic instead (e.g.
> read_lock_nomb(), or something with a better name like __read_lock()) ?
> On architectures where memory barriers are needed to provide the acquire
> semantic, it would be faster to do :
> 
> __read_lock();
> smp_mb();
> 
> than :
> 
> read_lock(); <- e.g. lwsync + isync or something like that
> smp_mb(); <- full sync.

Hmm, why do we even care when read_lock should just die?

Cheers,
Eric Dumazet July 3, 2009, 3:37 p.m. UTC | #8
Herbert Xu a écrit :
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
>> Why don't we create a read_lock without acquire semantic instead (e.g.
>> read_lock_nomb(), or something with a better name like __read_lock()) ?
>> On architectures where memory barriers are needed to provide the acquire
>> semantic, it would be faster to do :
>>
>> __read_lock();
>> smp_mb();
>>
>> than :
>>
>> read_lock(); <- e.g. lwsync + isync or something like that
>> smp_mb(); <- full sync.
> 
> Hmm, why do we even care when read_lock should just die?
> 
> Cheers,

+1 :)

Do you mean using a spinlock instead or what ?

Also, how many arches are able to have a true __read_lock()
(or __spin_lock() if that matters), without acquire semantic ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers July 3, 2009, 3:40 p.m. UTC | #9
* Herbert Xu (herbert@gondor.apana.org.au) wrote:
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > 
> > Why don't we create a read_lock without acquire semantic instead (e.g.
> > read_lock_nomb(), or something with a better name like __read_lock()) ?
> > On architectures where memory barriers are needed to provide the acquire
> > semantic, it would be faster to do :
> > 
> > __read_lock();
> > smp_mb();
> > 
> > than :
> > 
> > read_lock(); <- e.g. lwsync + isync or something like that
> > smp_mb(); <- full sync.
> 
> Hmm, why do we even care when read_lock should just die?
> 

I guess you are proposing migration from rwlock to RCU ?

Well, in cases where critical sections are in the order of 20000
cycles or more, and with 8 to 64-core machines, there is no significant
gain in using RCU vs rwlocks, so the added complexity might not be
justified if the critical sections are very long.

But then it's a case by case thing. We would have to see what exactly is
protected by this read lock and how long the critical section is.
However, in any case, you are right: rwlocks are acceptable only for
long critical sections, for which we just don't care about one extra
memory barrier.

Instead of optimizing away these read-side rwlock barriers, we would
spend our time much more efficiently switching to RCU read side.

Mathieu

> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Mathieu Desnoyers July 3, 2009, 3:47 p.m. UTC | #10
* Eric Dumazet (eric.dumazet@gmail.com) wrote:
> Herbert Xu a écrit :
> > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> >> Why don't we create a read_lock without acquire semantic instead (e.g.
> >> read_lock_nomb(), or something with a better name like __read_lock()) ?
> >> On architectures where memory barriers are needed to provide the acquire
> >> semantic, it would be faster to do :
> >>
> >> __read_lock();
> >> smp_mb();
> >>
> >> than :
> >>
> >> read_lock(); <- e.g. lwsync + isync or something like that
> >> smp_mb(); <- full sync.
> > 
> > Hmm, why do we even care when read_lock should just die?
> > 
> > Cheers,
> 
> +1 :)
> 
> Do you mean using a spinlock instead or what ?
> 

I think he meant RCU.

> Also, how many arches are able to have a true __read_lock()
> (or __spin_lock() if that matters), without acquire semantic ?

At least PowerPC, MIPS, recent ARM, alpha.

Mathieu
Paul E. McKenney July 3, 2009, 5:06 p.m. UTC | #11
On Fri, Jul 03, 2009 at 11:47:00AM -0400, Mathieu Desnoyers wrote:
> * Eric Dumazet (eric.dumazet@gmail.com) wrote:
> > Herbert Xu a écrit :
> > > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > >> Why don't we create a read_lock without acquire semantic instead (e.g.
> > >> read_lock_nomb(), or something with a better name like __read_lock()) ?
> > >> On architectures where memory barriers are needed to provide the acquire
> > >> semantic, it would be faster to do :
> > >>
> > >> __read_lock();
> > >> smp_mb();
> > >>
> > >> than :
> > >>
> > >> read_lock(); <- e.g. lwsync + isync or something like that
> > >> smp_mb(); <- full sync.
> > > 
> > > Hmm, why do we even care when read_lock should just die?
> > > 
> > > Cheers,
> > 
> > +1 :)
> > 
> > Do you mean using a spinlock instead or what ?
> > 
> 
> I think he meant RCU.
> 
> > Also, how many arches are able to have a true __read_lock()
> > (or __spin_lock() if that matters), without acquire semantic ?
> 
> At least PowerPC, MIPS, recent ARM, alpha.

Are you guys sure you are in agreement about what you all mean by
"acquire semantics"?

Clearly, any correct __read_lock() implementation must enforce ordering
with respect to the most recent __write_unlock(), but this does not
necesarily imply all possible definitions of "acquire semantics".

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers July 3, 2009, 5:31 p.m. UTC | #12
* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Fri, Jul 03, 2009 at 11:47:00AM -0400, Mathieu Desnoyers wrote:
> > * Eric Dumazet (eric.dumazet@gmail.com) wrote:
> > > Herbert Xu a écrit :
> > > > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > >> Why don't we create a read_lock without acquire semantic instead (e.g.
> > > >> read_lock_nomb(), or something with a better name like __read_lock()) ?
> > > >> On architectures where memory barriers are needed to provide the acquire
> > > >> semantic, it would be faster to do :
> > > >>
> > > >> __read_lock();
> > > >> smp_mb();
> > > >>
> > > >> than :
> > > >>
> > > >> read_lock(); <- e.g. lwsync + isync or something like that
> > > >> smp_mb(); <- full sync.
> > > > 
> > > > Hmm, why do we even care when read_lock should just die?
> > > > 
> > > > Cheers,
> > > 
> > > +1 :)
> > > 
> > > Do you mean using a spinlock instead or what ?
> > > 
> > 
> > I think he meant RCU.
> > 
> > > Also, how many arches are able to have a true __read_lock()
> > > (or __spin_lock() if that matters), without acquire semantic ?
> > 
> > At least PowerPC, MIPS, recent ARM, alpha.
> 
> Are you guys sure you are in agreement about what you all mean by
> "acquire semantics"?
> 

I use acquire/release semantic with the following meaning :

...
read A
read_unlock()

read B

read_lock();
read C

read_unlock would provide release semantic by disallowing read A to move
after the read_unlock.

read_lock would provide acquire semantic by disallowing read C to move
before read_lock.

read B is free to move.


> Clearly, any correct __read_lock() implementation must enforce ordering
> with respect to the most recent __write_unlock(), but this does not
> necesarily imply all possible definitions of "acquire semantics".
> 

Yes, you are right. We could never remove _all_ memory barriers from
__read_lock()/__read_unlock implementations even if we require something
such as :

__read_lock()
smp_mb()

critical section.

smp_mb()
__read_unlock()

Because we also need to guarantee that consecutive unlock/lock won't be
reordered, which implies a barrier _outside_ of the read lock/unlock
atomic operations.

But anyway I'm not sure it's worth trying to optimize rwlocks, given
that for critical sections where the performance hit of a memory barrier
would be perceivable, we should really think about using RCU rather than
beating this dead horse. :)

Thanks,

Mathieu.


> 							Thanx, Paul
diff mbox

Patch

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index b7e5db8..39ecc5f 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -302,4 +302,7 @@  static inline void __raw_write_unlock(raw_rwlock_t *rw)
 #define _raw_read_relax(lock)	cpu_relax()
 #define _raw_write_relax(lock)	cpu_relax()
 
+/* The {read|write|spin}_lock() on x86 are full memory barriers. */
+#define smp_mb__after_lock() do { } while (0)
+
 #endif /* _ASM_X86_SPINLOCK_H */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 252b245..ae053bd 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -132,6 +132,11 @@  do {								\
 #endif /*__raw_spin_is_contended*/
 #endif
 
+/* The lock does not imply full memory barrier. */
+#ifndef smp_mb__after_lock
+#define smp_mb__after_lock() smp_mb()
+#endif
+
 /**
  * spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/include/net/sock.h b/include/net/sock.h
index 4eb8409..98afcd9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1271,6 +1271,9 @@  static inline int sk_has_allocations(const struct sock *sk)
  * in its cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1
  * could then endup calling schedule and sleep forever if there are no more
  * data on the socket.
+ *
+ * The sk_has_helper is always called right after a call to read_lock, so we
+ * can use smp_mb__after_lock barrier.
  */
 static inline int sk_has_sleeper(struct sock *sk)
 {
@@ -1280,7 +1283,7 @@  static inline int sk_has_sleeper(struct sock *sk)
 	 *
 	 * This memory barrier is paired in the sock_poll_wait.
 	 */
-	smp_mb();
+	smp_mb__after_lock();
 	return sk->sk_sleep && waitqueue_active(sk->sk_sleep);
 }