Message ID | 1454326614.4592.293.camel@localhost.localdomain |
---|---|
State | New |
Headers | show |
On 01/02/16 11:36, Torvald Riegel wrote: > On Sun, 2016-01-31 at 15:09 -0800, Paul Pluzhnikov wrote: >> On Mon, Jan 25, 2016 at 5:06 AM, Torvald Riegel <triegel@redhat.com> wrote: >> >>> For the spinlocks, I'd really prefer if we could just remove the asm >>> files. The generic implementation should basically produce the same >>> code; if not, we should try to fix that instead of keeping the asm >>> files. >> >> Using gcc-4.8.4 (4.8.4-2ubuntu1~14.04): >> >> $ objdump -d nptl/pthread_spin_unlock.o >> >> nptl/pthread_spin_unlock.o: file format elf32-i386 >> >> >> Disassembly of section .text: >> >> 00000000 <pthread_spin_unlock>: >> 0: f0 83 0c 24 00 lock orl $0x0,(%esp) >> 5: 8b 44 24 04 mov 0x4(%esp),%eax >> 9: c7 00 00 00 00 00 movl $0x0,(%eax) >> f: 31 c0 xor %eax,%eax >> 11: c3 ret >> >> This isn't quite the same as sysdeps/i386/nptl/pthread_spin_unlock.S > > This is because nptl/pthread_spin_unlock.c still issues a full barrier. > If this is changed to an atomic_store_release, one gets this on x86_64: > > 0000000000000000 <pthread_spin_unlock>: > 0: c7 07 00 00 00 00 movl $0x0,(%rdi) > 6: 31 c0 xor %eax,%eax > 8: c3 > > Perhaps now is a good time to finally get this done. Most archs are > already using acquire semantics, IIRC. I think aarch64 and arm are the > only major ones that happen to use the current generic unlock with full > barrier -- but they only use acquire MO on unlock, so there's really no > consistent pattern that would justify this. i think mb(); store(); is actually *weaker* than store_release(); and thus on some hw it might be a bit faster, but i'm not against changing to store_release (that's one step closer to posix semantics). (full barrier is weaker here because store_release() has to prevent reordering wrt load_acquire in *both* directions, so it may be implemented by the hw like mb(); store(); mb(); although that's not the most efficient implementation..) > Note that there's an ongoing debate about whether POSIX requires > pthread_spin_unlock to be a full barrier, whether it should or should the current unlock is not enough for posix if trylock is acquire MO: T1: unlock(l1); if (trylock(l2))... T2: unlock(l2); if (trylock(l1))... with one sided barrier, both trylock can fail to grab the lock (the loads are not guaranteed to happen after the stores) which is not seq cst, this does not happen with release MO unlock.
On 01/02/16 12:03, Szabolcs Nagy wrote: > the current unlock is not enough for posix if trylock is > acquire MO: > > T1: > unlock(l1); > if (trylock(l2))... > > T2: > unlock(l2); > if (trylock(l1))... > > with one sided barrier, both trylock can fail to grab > the lock (the loads are not guaranteed to happen after > the stores) which is not seq cst, this does not happen > with release MO unlock. > sorry, acquire/release MO is not enough to fix this in c11, on the hw level i believe it is enough with arm memory model.
On Mon, 2016-02-01 at 12:03 +0000, Szabolcs Nagy wrote: > On 01/02/16 11:36, Torvald Riegel wrote: > > On Sun, 2016-01-31 at 15:09 -0800, Paul Pluzhnikov wrote: > >> On Mon, Jan 25, 2016 at 5:06 AM, Torvald Riegel <triegel@redhat.com> wrote: > >> > >>> For the spinlocks, I'd really prefer if we could just remove the asm > >>> files. The generic implementation should basically produce the same > >>> code; if not, we should try to fix that instead of keeping the asm > >>> files. > >> > >> Using gcc-4.8.4 (4.8.4-2ubuntu1~14.04): > >> > >> $ objdump -d nptl/pthread_spin_unlock.o > >> > >> nptl/pthread_spin_unlock.o: file format elf32-i386 > >> > >> > >> Disassembly of section .text: > >> > >> 00000000 <pthread_spin_unlock>: > >> 0: f0 83 0c 24 00 lock orl $0x0,(%esp) > >> 5: 8b 44 24 04 mov 0x4(%esp),%eax > >> 9: c7 00 00 00 00 00 movl $0x0,(%eax) > >> f: 31 c0 xor %eax,%eax > >> 11: c3 ret > >> > >> This isn't quite the same as sysdeps/i386/nptl/pthread_spin_unlock.S > > > > This is because nptl/pthread_spin_unlock.c still issues a full barrier. > > If this is changed to an atomic_store_release, one gets this on x86_64: > > > > 0000000000000000 <pthread_spin_unlock>: > > 0: c7 07 00 00 00 00 movl $0x0,(%rdi) > > 6: 31 c0 xor %eax,%eax > > 8: c3 > > > > Perhaps now is a good time to finally get this done. Most archs are > > already using acquire semantics, IIRC. I think aarch64 and arm are the > > only major ones that happen to use the current generic unlock with full > > barrier -- but they only use acquire MO on unlock, so there's really no > > consistent pattern that would justify this. > > i think mb(); store(); is actually *weaker* than store_release(); If that's indeed the case in the context of the C11 memory model, this is a bug. But I would be surprised if that's the case. It would also be a bug if the atomic_full_barrier implementation we have currently is actually not implementing a C11 seq_cst barrier. Also cross-check against the mappings here, which I trust to be correct: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html > and thus on some hw it might be a bit faster, but i'm not against > changing to store_release (that's one step closer to posix semantics). In the context of the memory model used in glibc, store_release is weaker than a atomic_full_barrier (which is supposed to be at least as strong as a C11 seq_cst fence). > (full barrier is weaker here because store_release() has to > prevent reordering wrt load_acquire in *both* directions, so > it may be implemented by the hw like mb(); store(); mb(); > although that's not the most efficient implementation..) I'm not an expert on the ARM memory model, but I believe your assumption that the semantics we require for atomic_store_release has to prevent reordering in both directions on ARM is wrong. Even a compiler can often move stuff from after to before a store_release; the release MO guarantee is, simplified, something like "if something was before the release MO on the release side, it will not appear on the observer's side as if after the release, provided the observer used an acquire load to observe the release store". > > Note that there's an ongoing debate about whether POSIX requires > > pthread_spin_unlock to be a full barrier, whether it should or should > > the current unlock is not enough for posix if trylock is > acquire MO: > > T1: > unlock(l1); > if (trylock(l2))... > > T2: > unlock(l2); > if (trylock(l1))... > > with one sided barrier, both trylock can fail to grab > the lock (the loads are not guaranteed to happen after > the stores) which is not seq cst, this does not happen > with release MO unlock. No. If unlock is a release MO store, and trylock is an acquire load, then both trylocks can fail and both trylock's can succeed. Your example is similar to Dekker synchronization, and Dekker synchronization is never guaranteed to produce a winner, and release/acquire are not sufficient to implement it. I suggest using the cppmem tool to play around with it and have a look at the possible executions. If unlock is a seq_cst store and trylock is a seq_cst acquire, this Dekker implementation would work except that POSIX doesn't guarantee "synchronizes memory" for a call that fails (so the trylock isn't sufficient, and you have to assume something like that it can fail spuriously). If unlock were an at-least-release MO fence followed by a relaxed MO store to the lock followed by a seq_cst fence, this example would work. But this shows, in turn, that (a) "synchronizes memory" can be costly to implement and (b) POSIX shouldn't try to support hacks that emulate proper atomics (ie, trylock in the example above).
commit f9a5437b0c0150bac4c5afd769dd6eba09fed1de Author: Torvald Riegel <triegel@redhat.com> Date: Mon Feb 1 12:35:50 2016 +0100 generic spinlock cleanup and x86_64 customization removal. diff --git a/nptl/pthread_spin_lock.c b/nptl/pthread_spin_lock.c index fb9bcc1..2209341 100644 --- a/nptl/pthread_spin_lock.c +++ b/nptl/pthread_spin_lock.c @@ -38,7 +38,7 @@ pthread_spin_lock (pthread_spinlock_t *lock) We assume that the first try mostly will be successful, and we use atomic_exchange. For the subsequent tries we use atomic_compare_and_exchange. */ - if (atomic_exchange_acq (lock, 1) == 0) + if (__glibc_likely (atomic_exchange_acq (lock, 1) == 0)) return 0; do diff --git a/nptl/pthread_spin_unlock.c b/nptl/pthread_spin_unlock.c index d4b63ac..ca534a0 100644 --- a/nptl/pthread_spin_unlock.c +++ b/nptl/pthread_spin_unlock.c @@ -23,7 +23,6 @@ int pthread_spin_unlock (pthread_spinlock_t *lock) { - atomic_full_barrier (); - *lock = 0; + atomic_store_release (lock, 0); return 0; } diff --git a/sysdeps/x86_64/nptl/pthread_spin_init.c b/sysdeps/x86_64/nptl/pthread_spin_init.c deleted file mode 100644 index f249c6f..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_init.c +++ /dev/null @@ -1 +0,0 @@ -#include <sysdeps/i386/nptl/pthread_spin_init.c> diff --git a/sysdeps/x86_64/nptl/pthread_spin_lock.S b/sysdeps/x86_64/nptl/pthread_spin_lock.S deleted file mode 100644 index b871241..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_lock.S +++ /dev/null @@ -1,34 +0,0 @@ -/* Copyright (C) 2012-2016 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <http://www.gnu.org/licenses/>. */ - -#include <lowlevellock.h> -#include <sysdep.h> - -ENTRY(pthread_spin_lock) -1: LOCK - decl 0(%rdi) - jne 2f - xor %eax, %eax - ret - - .align 16 -2: rep - nop - cmpl $0, 0(%rdi) - jg 1b - jmp 2b -END(pthread_spin_lock) diff --git a/sysdeps/x86_64/nptl/pthread_spin_trylock.S b/sysdeps/x86_64/nptl/pthread_spin_trylock.S deleted file mode 100644 index c9c5317..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_trylock.S +++ /dev/null @@ -1,37 +0,0 @@ -/* Copyright (C) 2002-2016 Free Software Foundation, Inc. - This file is part of the GNU C Library. - Contributed by Ulrich Drepper <drepper@redhat.com>, 2002. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <http://www.gnu.org/licenses/>. */ - -#include <pthread-errnos.h> -#include <sysdep.h> - - -#ifdef UP -# define LOCK -#else -# define LOCK lock -#endif - -ENTRY(pthread_spin_trylock) - movl $1, %eax - xorl %ecx, %ecx - LOCK - cmpxchgl %ecx, (%rdi) - movl $EBUSY, %eax - cmovel %ecx, %eax - retq -END(pthread_spin_trylock) diff --git a/sysdeps/x86_64/nptl/pthread_spin_unlock.S b/sysdeps/x86_64/nptl/pthread_spin_unlock.S deleted file mode 100644 index 188de2e..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_unlock.S +++ /dev/null @@ -1,29 +0,0 @@ -/* Copyright (C) 2002-2016 Free Software Foundation, Inc. - This file is part of the GNU C Library. - Contributed by Ulrich Drepper <drepper@redhat.com>, 2002. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <http://www.gnu.org/licenses/>. */ - -#include <sysdep.h> - -ENTRY(pthread_spin_unlock) - movl $1, (%rdi) - xorl %eax, %eax - retq -END(pthread_spin_unlock) - - /* The implementation of pthread_spin_init is identical. */ - .globl pthread_spin_init -pthread_spin_init = pthread_spin_unlock