Message ID | 20081012054634.GA12535@wotan.suse.de (mailing list archive) |
---|---|
State | Awaiting Upstream, archived |
Delegated to: | Paul Mackerras |
Headers | show |
On Sun, 2008-10-12 at 07:46 +0200, Nick Piggin wrote: > Speed up generic mutex implementations. > > - atomic operations which both modify the variable and return something imply > full smp memory barriers before and after the memory operations involved > (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because > they don't modify the target). See Documentation/atomic_ops.txt. > So remove extra barriers and branches. > > - All architectures support atomic_cmpxchg. This has no relation to > __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally > > This reduces a simple single threaded fastpath lock+unlock test from 590 cycles > to 203 cycles on a ppc970 system. > > Signed-off-by: Nick Piggin <npiggin@suse.de> Looks ok. Cheers, Ben.
* Nick Piggin <npiggin@suse.de> wrote: > Speed up generic mutex implementations. > > - atomic operations which both modify the variable and return something imply > full smp memory barriers before and after the memory operations involved > (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because > they don't modify the target). See Documentation/atomic_ops.txt. > So remove extra barriers and branches. > > - All architectures support atomic_cmpxchg. This has no relation to > __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally > > This reduces a simple single threaded fastpath lock+unlock test from 590 cycles > to 203 cycles on a ppc970 system. > > Signed-off-by: Nick Piggin <npiggin@suse.de> no objections here. Lets merge these two patches via the ppc tree, so that it gets testing on real hardware as well? Acked-by: Ingo Molnar <mingo@elte.hu> Ingo
Nick Piggin <npiggin@suse.de> wrote: > Speed up generic mutex implementations. > > - atomic operations which both modify the variable and return something imply > full smp memory barriers before and after the memory operations involved > (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because > they don't modify the target). See Documentation/atomic_ops.txt. > So remove extra barriers and branches. > > - All architectures support atomic_cmpxchg. This has no relation to > __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally > > This reduces a simple single threaded fastpath lock+unlock test from 590 cycles > to 203 cycles on a ppc970 system. > > Signed-off-by: Nick Piggin <npiggin@suse.de> This seems to work on FRV which uses the mutex-dec generic algorithm, though you have to take that with a pinch of salt as I don't have SMP hardware for it. Acked-by: David Howells <dhowells@redhat.com>
On Wed, 2008-10-22 at 17:59 +0200, Ingo Molnar wrote: > * Nick Piggin <npiggin@suse.de> wrote: > > > Speed up generic mutex implementations. > > > > - atomic operations which both modify the variable and return something imply > > full smp memory barriers before and after the memory operations involved > > (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because > > they don't modify the target). See Documentation/atomic_ops.txt. > > So remove extra barriers and branches. > > > > - All architectures support atomic_cmpxchg. This has no relation to > > __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally > > > > This reduces a simple single threaded fastpath lock+unlock test from 590 cycles > > to 203 cycles on a ppc970 system. > > > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > no objections here. Lets merge these two patches via the ppc tree, so > that it gets testing on real hardware as well? > > Acked-by: Ingo Molnar <mingo@elte.hu> Allright but in that case it will be after -rc1 unless I manage to sneak something in tomorrow before linux closes the merge window. I can't get an update today. Cheers, Ben.
On Thu, Oct 23, 2008 at 03:43:58PM +1100, Benjamin Herrenschmidt wrote: > On Wed, 2008-10-22 at 17:59 +0200, Ingo Molnar wrote: > > * Nick Piggin <npiggin@suse.de> wrote: > > > > > Speed up generic mutex implementations. > > > > > > - atomic operations which both modify the variable and return something imply > > > full smp memory barriers before and after the memory operations involved > > > (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because > > > they don't modify the target). See Documentation/atomic_ops.txt. > > > So remove extra barriers and branches. > > > > > > - All architectures support atomic_cmpxchg. This has no relation to > > > __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally > > > > > > This reduces a simple single threaded fastpath lock+unlock test from 590 cycles > > > to 203 cycles on a ppc970 system. > > > > > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > > > no objections here. Lets merge these two patches via the ppc tree, so > > that it gets testing on real hardware as well? > > > > Acked-by: Ingo Molnar <mingo@elte.hu> > > Allright but in that case it will be after -rc1 unless I manage to sneak > something in tomorrow before linux closes the merge window. > > I can't get an update today. Fine with me. Thanks, Nick
Index: linux-2.6/include/asm-generic/mutex-dec.h =================================================================== --- linux-2.6.orig/include/asm-generic/mutex-dec.h +++ linux-2.6/include/asm-generic/mutex-dec.h @@ -22,8 +22,6 @@ __mutex_fastpath_lock(atomic_t *count, v { if (unlikely(atomic_dec_return(count) < 0)) fail_fn(count); - else - smp_mb(); } /** @@ -41,10 +39,7 @@ __mutex_fastpath_lock_retval(atomic_t *c { if (unlikely(atomic_dec_return(count) < 0)) return fail_fn(count); - else { - smp_mb(); - return 0; - } + return 0; } /** @@ -63,7 +58,6 @@ __mutex_fastpath_lock_retval(atomic_t *c static inline void __mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *)) { - smp_mb(); if (unlikely(atomic_inc_return(count) <= 0)) fail_fn(count); } @@ -98,15 +92,9 @@ __mutex_fastpath_trylock(atomic_t *count * just as efficient (and simpler) as a 'destructive' probing of * the mutex state would be. */ -#ifdef __HAVE_ARCH_CMPXCHG - if (likely(atomic_cmpxchg(count, 1, 0) == 1)) { - smp_mb(); + if (likely(atomic_cmpxchg(count, 1, 0) == 1)) return 1; - } return 0; -#else - return fail_fn(count); -#endif } #endif Index: linux-2.6/include/asm-generic/mutex-xchg.h =================================================================== --- linux-2.6.orig/include/asm-generic/mutex-xchg.h +++ linux-2.6/include/asm-generic/mutex-xchg.h @@ -27,8 +27,6 @@ __mutex_fastpath_lock(atomic_t *count, v { if (unlikely(atomic_xchg(count, 0) != 1)) fail_fn(count); - else - smp_mb(); } /** @@ -46,10 +44,7 @@ __mutex_fastpath_lock_retval(atomic_t *c { if (unlikely(atomic_xchg(count, 0) != 1)) return fail_fn(count); - else { - smp_mb(); - return 0; - } + return 0; } /** @@ -67,7 +62,6 @@ __mutex_fastpath_lock_retval(atomic_t *c static inline void __mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *)) { - smp_mb(); if (unlikely(atomic_xchg(count, 1) != 0)) fail_fn(count); } @@ -110,7 +104,6 @@ __mutex_fastpath_trylock(atomic_t *count if (prev < 0) prev = 0; } - smp_mb(); return prev; }
Speed up generic mutex implementations. - atomic operations which both modify the variable and return something imply full smp memory barriers before and after the memory operations involved (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because they don't modify the target). See Documentation/atomic_ops.txt. So remove extra barriers and branches. - All architectures support atomic_cmpxchg. This has no relation to __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally This reduces a simple single threaded fastpath lock+unlock test from 590 cycles to 203 cycles on a ppc970 system. Signed-off-by: Nick Piggin <npiggin@suse.de> ---