Message ID | 1363440569-17331-1-git-send-email-andi@firstfloor.org |
---|---|
State | New |
Headers | show |
On 16 March 2013 13:29, Andi Kleen wrote: > > With inline __attribute__((always_inline)) these functions > get inlined even with -O0. > > I hardcoded the attribute in the header for now, assuming > that all compilers that support libstdc++ have attribute > always_inline too. If not it would need to be moved > as a macro to c++config.h with appropiate ifdefs. That should be fine. I assume __always_inline was chosen rather than _GLIBCXX_ALWAYS_INLINE for consistency with glibc?
On 18 March 2013 16:28, Jonathan Wakely wrote: > On 16 March 2013 13:29, Andi Kleen wrote: >> >> With inline __attribute__((always_inline)) these functions >> get inlined even with -O0. >> >> I hardcoded the attribute in the header for now, assuming >> that all compilers that support libstdc++ have attribute >> always_inline too. If not it would need to be moved >> as a macro to c++config.h with appropiate ifdefs. > > That should be fine. I assume __always_inline was chosen rather than > _GLIBCXX_ALWAYS_INLINE for consistency with glibc? Ah, I see it's also being used in libitm and libatomic, not just libstdc++.
On Mon, Mar 18, 2013 at 04:28:13PM +0000, Jonathan Wakely wrote: > On 16 March 2013 13:29, Andi Kleen wrote: > > > > With inline __attribute__((always_inline)) these functions > > get inlined even with -O0. > > > > I hardcoded the attribute in the header for now, assuming > > that all compilers that support libstdc++ have attribute > > always_inline too. If not it would need to be moved > > as a macro to c++config.h with appropiate ifdefs. > > That should be fine. I assume __always_inline was chosen rather than > _GLIBCXX_ALWAYS_INLINE for consistency with glibc? Actually with the Linux kernel, but it's pretty arbitary to be honest. -Andi
On Mon, Mar 18, 2013 at 04:28:13PM +0000, Jonathan Wakely wrote: > On 16 March 2013 13:29, Andi Kleen wrote: > > > > With inline __attribute__((always_inline)) these functions > > get inlined even with -O0. > > > > I hardcoded the attribute in the header for now, assuming > > that all compilers that support libstdc++ have attribute > > always_inline too. If not it would need to be moved > > as a macro to c++config.h with appropiate ifdefs. > > That should be fine. I assume __always_inline was chosen rather than > _GLIBCXX_ALWAYS_INLINE for consistency with glibc? Using __always_inline as the name of the macro is a bad idea, glibc headers use that macro already. Just use something else in public headers that aren't part of glibc. Jakub
> Using __always_inline as the name of the macro is a bad idea, glibc > headers use that macro already. Just use something else in public headers > that aren't part of glibc. That's why I had the ifdef, but ok. I'll use __force_inline then. -Andi
On Tue, Mar 19, 2013 at 08:51:21AM -0700, Andi Kleen wrote: > > Using __always_inline as the name of the macro is a bad idea, glibc > > headers use that macro already. Just use something else in public headers > > that aren't part of glibc. > > That's why I had the ifdef, but ok. I'll use __force_inline then. I'd say Jonathan's _GLIBCXX_ALWAYS_INLINE would be better. BTW, have you verified always_inline works fine for -O0? int x; inline __attribute__((always_inline)) bool test_and_set(int __m = 5) { return __atomic_test_and_set (&x, __m); } int foo (void) { return test_and_set (65536 | 5); } with -O0 -mhle doesn't result in xacquire, guess for !optimize get_memmodel would need to look through chain of SSA_NAMEs if SSA_NAME (that can appear because of inlining), looking for INTEGER_CSTs. If there is: _7 = 0x10005; _8 = _7; _9 = _8; __atomic_test_and_set (&x, _9); still return 0x10005 rather than MEMMODEL_SEQ_CST. Jakub
On Tue, Mar 19, 2013 at 05:10:22PM +0100, Jakub Jelinek wrote: > On Tue, Mar 19, 2013 at 08:51:21AM -0700, Andi Kleen wrote: > > > Using __always_inline as the name of the macro is a bad idea, glibc > > > headers use that macro already. Just use something else in public headers > > > that aren't part of glibc. > > > > That's why I had the ifdef, but ok. I'll use __force_inline then. > > I'd say Jonathan's _GLIBCXX_ALWAYS_INLINE would be better. > > BTW, have you verified always_inline works fine for -O0? I did some simple tests and it seemed to work. But thanks for the counter example. > > with -O0 -mhle doesn't result in xacquire, guess for !optimize > get_memmodel would need to look through chain of SSA_NAMEs if SSA_NAME > (that can appear because of inlining), looking for INTEGER_CSTs. interesting. so need more fixes. i'll look into that. Is there already a convenient helper for it? There was an alternative approach of using a template, but I suppose that would have the same problem. -Andi
On Tue, Mar 19, 2013 at 6:30 PM, Andi Kleen <andi@firstfloor.org> wrote: > On Tue, Mar 19, 2013 at 05:10:22PM +0100, Jakub Jelinek wrote: >> On Tue, Mar 19, 2013 at 08:51:21AM -0700, Andi Kleen wrote: >> > > Using __always_inline as the name of the macro is a bad idea, glibc >> > > headers use that macro already. Just use something else in public headers >> > > that aren't part of glibc. >> > >> > That's why I had the ifdef, but ok. I'll use __force_inline then. >> >> I'd say Jonathan's _GLIBCXX_ALWAYS_INLINE would be better. >> >> BTW, have you verified always_inline works fine for -O0? > > I did some simple tests and it seemed to work. > > But thanks for the counter example. > >> >> with -O0 -mhle doesn't result in xacquire, guess for !optimize >> get_memmodel would need to look through chain of SSA_NAMEs if SSA_NAME >> (that can appear because of inlining), looking for INTEGER_CSTs. > > interesting. so need more fixes. i'll look into that. Is there > already a convenient helper for it? Not without using information created by TER which is disabled for this case because of different line information. If TER would not be disabled for this reason it would already work automagically. Eventually we can relax this restriction for constants that are only fed through temporaries ... > There was an alternative approach of using a template, but I suppose > that would have the same problem. Possibly yes, but I guess the frontend substitutes without going through an extra temporary (if all arguments are template parameters!) Richard. > > -Andi
On Wed, Mar 20, 2013 at 11:38:03AM +0100, Richard Biener wrote: > Not without using information created by TER which is disabled for this > case because of different line information. If TER would not be disabled > for this reason it would already work automagically. Would relaxing that in TER for constants help in the case of: static inline __attribute__((always_inline)) ... foo (..., int m = __ATOMIC_SEQ_CST) { if (something) bar (); else baz (); __atomic_test_and_set (&x, m); } void test () { foo (..., __ATOMIC_HLE_ACQUIRE | __ATOMIC_SEQ_CST); } though? I'd think as the temp = 0x10005; would be in a different bb, TER wouldn't do anything here, for -O1 of course CCP or similar would propagate that, but for -O0 we'd still have to walk the chain of SSA_NAMEs. Jakub
On Wed, Mar 20, 2013 at 11:46 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Mar 20, 2013 at 11:38:03AM +0100, Richard Biener wrote: >> Not without using information created by TER which is disabled for this >> case because of different line information. If TER would not be disabled >> for this reason it would already work automagically. > > Would relaxing that in TER for constants help in the case of: > static inline __attribute__((always_inline)) ... > foo (..., int m = __ATOMIC_SEQ_CST) > { > if (something) > bar (); > else > baz (); > __atomic_test_and_set (&x, m); > } > > void > test () > { > foo (..., __ATOMIC_HLE_ACQUIRE | __ATOMIC_SEQ_CST); > } > though? I'd think as the temp = 0x10005; would be in a different bb, TER > wouldn't do anything here, for -O1 of course CCP or similar would propagate > that, but for -O0 we'd still have to walk the chain of SSA_NAMEs. True. I was for a long time thinking of running a CCP pass at -O0 ... and only restrict what we sustitute into (builtin arguments and asm operands). Walking the SSA def chain would also be possible - preferably from cfgexpand.c when we process the stmt. But you'll eventually lose debug info like for int i = 1; __builtin_foo (i); when we then never expand the SSA name def i_2 = 1 because we are never visiting its use ... Richard. > Jakub
On Wed, Mar 20, 2013 at 12:20:27PM +0100, Richard Biener wrote: > Walking the SSA def chain would also be possible - preferably from > cfgexpand.c when we process the stmt. But you'll eventually lose > debug info like for > > int i = 1; > __builtin_foo (i); > > when we then never expand the SSA name def i_2 = 1 because we are > never visiting its use ... If i is a user variable, then for -O0 we'd better create a memory slot for it and expand the i = 1 store as store to that memory location. Jakub
On Wed, Mar 20, 2013 at 12:24 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Mar 20, 2013 at 12:20:27PM +0100, Richard Biener wrote: >> Walking the SSA def chain would also be possible - preferably from >> cfgexpand.c when we process the stmt. But you'll eventually lose >> debug info like for >> >> int i = 1; >> __builtin_foo (i); >> >> when we then never expand the SSA name def i_2 = 1 because we are >> never visiting its use ... > > If i is a user variable, then for -O0 we'd better create a memory slot for > it and expand the i = 1 store as store to that memory location. Ok, we seem to do that... Richard. > Jakub
diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h index 609fe8b..475ffa0 100644 --- a/libstdc++-v3/include/bits/atomic_base.h +++ b/libstdc++-v3/include/bits/atomic_base.h @@ -37,6 +37,10 @@ #include <stdint.h> #include <bits/atomic_lockfree_defines.h> +#ifndef __always_inline +#define __always_inline inline __attribute__((always_inline)) +#endif + namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -94,11 +98,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION | (__m & __memory_order_modifier_mask)); } - inline void + __always_inline void atomic_thread_fence(memory_order __m) noexcept { __atomic_thread_fence(__m); } - inline void + __always_inline void atomic_signal_fence(memory_order __m) noexcept { __atomic_signal_fence(__m); } @@ -281,19 +285,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : __atomic_flag_base{ _S_init(__i) } { } - bool + __always_inline bool test_and_set(memory_order __m = memory_order_seq_cst) noexcept { return __atomic_test_and_set (&_M_i, __m); } - bool + __always_inline bool test_and_set(memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_test_and_set (&_M_i, __m); } - void + __always_inline void clear(memory_order __m = memory_order_seq_cst) noexcept { memory_order __b = __m & __memory_order_mask; @@ -304,7 +308,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __atomic_clear (&_M_i, __m); } - void + __always_inline void clear(memory_order __m = memory_order_seq_cst) volatile noexcept { memory_order __b = __m & __memory_order_mask; @@ -463,7 +467,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const volatile noexcept { return __atomic_is_lock_free(sizeof(_M_i), nullptr); } - void + __always_inline void store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept { memory_order __b = __m & __memory_order_mask; @@ -474,7 +478,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __atomic_store_n(&_M_i, __i, __m); } - void + __always_inline void store(__int_type __i, memory_order __m = memory_order_seq_cst) volatile noexcept { @@ -486,7 +490,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __atomic_store_n(&_M_i, __i, __m); } - __int_type + __always_inline __int_type load(memory_order __m = memory_order_seq_cst) const noexcept { memory_order __b = __m & __memory_order_mask; @@ -496,7 +500,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_load_n(&_M_i, __m); } - __int_type + __always_inline __int_type load(memory_order __m = memory_order_seq_cst) const volatile noexcept { memory_order __b = __m & __memory_order_mask; @@ -506,7 +510,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_load_n(&_M_i, __m); } - __int_type + __always_inline __int_type exchange(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept { @@ -514,14 +518,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } - __int_type + __always_inline __int_type exchange(__int_type __i, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_exchange_n(&_M_i, __i, __m); } - bool + __always_inline bool compare_exchange_weak(__int_type& __i1, __int_type __i2, memory_order __m1, memory_order __m2) noexcept { @@ -534,7 +538,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 1, __m1, __m2); } - bool + __always_inline bool compare_exchange_weak(__int_type& __i1, __int_type __i2, memory_order __m1, memory_order __m2) volatile noexcept @@ -548,7 +552,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 1, __m1, __m2); } - bool + __always_inline bool compare_exchange_weak(__int_type& __i1, __int_type __i2, memory_order __m = memory_order_seq_cst) noexcept { @@ -556,7 +560,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __cmpexch_failure_order(__m)); } - bool + __always_inline bool compare_exchange_weak(__int_type& __i1, __int_type __i2, memory_order __m = memory_order_seq_cst) volatile noexcept { @@ -564,7 +568,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __cmpexch_failure_order(__m)); } - bool + __always_inline bool compare_exchange_strong(__int_type& __i1, __int_type __i2, memory_order __m1, memory_order __m2) noexcept { @@ -577,7 +581,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 0, __m1, __m2); } - bool + __always_inline bool compare_exchange_strong(__int_type& __i1, __int_type __i2, memory_order __m1, memory_order __m2) volatile noexcept @@ -592,7 +596,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 0, __m1, __m2); } - bool + __always_inline bool compare_exchange_strong(__int_type& __i1, __int_type __i2, memory_order __m = memory_order_seq_cst) noexcept { @@ -600,7 +604,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __cmpexch_failure_order(__m)); } - bool + __always_inline bool compare_exchange_strong(__int_type& __i1, __int_type __i2, memory_order __m = memory_order_seq_cst) volatile noexcept { @@ -608,52 +612,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __cmpexch_failure_order(__m)); } - __int_type + __always_inline __int_type fetch_add(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept { return __atomic_fetch_add(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_add(__int_type __i, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_fetch_add(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_sub(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept { return __atomic_fetch_sub(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_sub(__int_type __i, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_fetch_sub(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_and(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept { return __atomic_fetch_and(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_and(__int_type __i, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_fetch_and(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_or(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept { return __atomic_fetch_or(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_or(__int_type __i, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_fetch_or(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_xor(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept { return __atomic_fetch_xor(&_M_i, __i, __m); } - __int_type + __always_inline __int_type fetch_xor(__int_type __i, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_fetch_xor(&_M_i, __i, __m); } @@ -770,7 +774,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const volatile noexcept { return __atomic_is_lock_free(_M_type_size(1), nullptr); } - void + __always_inline void store(__pointer_type __p, memory_order __m = memory_order_seq_cst) noexcept { @@ -783,7 +787,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __atomic_store_n(&_M_p, __p, __m); } - void + __always_inline void store(__pointer_type __p, memory_order __m = memory_order_seq_cst) volatile noexcept { @@ -795,7 +799,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __atomic_store_n(&_M_p, __p, __m); } - __pointer_type + __always_inline __pointer_type load(memory_order __m = memory_order_seq_cst) const noexcept { memory_order __b = __m & __memory_order_mask; @@ -805,7 +809,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_load_n(&_M_p, __m); } - __pointer_type + __always_inline __pointer_type load(memory_order __m = memory_order_seq_cst) const volatile noexcept { memory_order __b = __m & __memory_order_mask; @@ -815,7 +819,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_load_n(&_M_p, __m); } - __pointer_type + __always_inline __pointer_type exchange(__pointer_type __p, memory_order __m = memory_order_seq_cst) noexcept { @@ -823,14 +827,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } - __pointer_type + __always_inline __pointer_type exchange(__pointer_type __p, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_exchange_n(&_M_p, __p, __m); } - bool + __always_inline bool compare_exchange_strong(__pointer_type& __p1, __pointer_type __p2, memory_order __m1, memory_order __m2) noexcept @@ -844,7 +848,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_compare_exchange_n(&_M_p, &__p1, __p2, 0, __m1, __m2); } - bool + __always_inline bool compare_exchange_strong(__pointer_type& __p1, __pointer_type __p2, memory_order __m1, memory_order __m2) volatile noexcept @@ -859,22 +863,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __atomic_compare_exchange_n(&_M_p, &__p1, __p2, 0, __m1, __m2); } - __pointer_type + __always_inline __pointer_type fetch_add(ptrdiff_t __d, memory_order __m = memory_order_seq_cst) noexcept { return __atomic_fetch_add(&_M_p, _M_type_size(__d), __m); } - __pointer_type + __always_inline __pointer_type fetch_add(ptrdiff_t __d, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_fetch_add(&_M_p, _M_type_size(__d), __m); } - __pointer_type + __always_inline __pointer_type fetch_sub(ptrdiff_t __d, memory_order __m = memory_order_seq_cst) noexcept { return __atomic_fetch_sub(&_M_p, _M_type_size(__d), __m); } - __pointer_type + __always_inline __pointer_type fetch_sub(ptrdiff_t __d, memory_order __m = memory_order_seq_cst) volatile noexcept { return __atomic_fetch_sub(&_M_p, _M_type_size(__d), __m); }
From: Andi Kleen <ak@linux.intel.com> When a non constant memory model is passed to __atomic_* gcc falls back to seq_cst. This drops any HLE acquire or release bits. This can happen when <atomic> is used with -O0 as the member functions are not always inlined then and the memory argument passed in ends up being non-constant. With inline __attribute__((always_inline)) these functions get inlined even with -O0. I hardcoded the attribute in the header for now, assuming that all compilers that support libstdc++ have attribute always_inline too. If not it would need to be moved as a macro to c++config.h with appropiate ifdefs. We still need a warning for this case too, that will be submitted separately. I would like to have this patch in the 4.8 series if possible, to make sure HLE works well with <atomic> Passed bootstrap and test on x86_64-linux. libstdc++v3/: 2013-03-15 Andi Kleen <ak@linux.intel.com> PR target/55947 * libstdc++-v3/include/bits/atomic_base.h (__always_inline): Add new macro. (atomic_thread_fence, atomic_signal_fence, test_and_set, clear, store, load, exchange, compare_exchange_weak) compare_exchange_strong, fetch_add, fetch_sub, fetch_and, fetch_or, fetch_xor): Mark __always_inline. --- libstdc++-v3/include/bits/atomic_base.h | 88 ++++++++++++++++--------------- 1 file changed, 46 insertions(+), 42 deletions(-)