Patchwork [1/4] Mark all member functions with memory models always inline

login
register
mail settings
Submitter Andi Kleen
Date March 16, 2013, 1:29 p.m.
Message ID <1363440569-17331-1-git-send-email-andi@firstfloor.org>
Download mbox | patch
Permalink /patch/228212/
State New
Headers show

Comments

Andi Kleen - March 16, 2013, 1:29 p.m.
From: Andi Kleen <ak@linux.intel.com>

When a non constant memory model is passed to __atomic_*
gcc falls back to seq_cst. This drops any HLE acquire or release bits.

This can happen when <atomic> is used with -O0
as the member functions are not always inlined then and the memory
argument passed in ends up being non-constant.

With inline __attribute__((always_inline)) these functions
get inlined even with -O0.

I hardcoded the attribute in the header for now, assuming
that all compilers that support libstdc++ have attribute
always_inline too. If not it would need to be moved
as a macro to c++config.h with appropiate ifdefs.

We still need a warning for this case too, that will be submitted
separately.

I would like to have this patch in the 4.8 series if possible, to make
sure HLE works well with <atomic>

Passed bootstrap and test on x86_64-linux.

libstdc++v3/:

2013-03-15  Andi Kleen  <ak@linux.intel.com>

	PR target/55947
	* libstdc++-v3/include/bits/atomic_base.h
	(__always_inline): Add new macro.
	(atomic_thread_fence, atomic_signal_fence, test_and_set,
	clear, store, load, exchange, compare_exchange_weak)
	compare_exchange_strong, fetch_add, fetch_sub, fetch_and,
	fetch_or, fetch_xor): Mark __always_inline.
---
 libstdc++-v3/include/bits/atomic_base.h |   88 ++++++++++++++++---------------
 1 file changed, 46 insertions(+), 42 deletions(-)
Jonathan Wakely - March 18, 2013, 4:28 p.m.
On 16 March 2013 13:29, Andi Kleen wrote:
>
> With inline __attribute__((always_inline)) these functions
> get inlined even with -O0.
>
> I hardcoded the attribute in the header for now, assuming
> that all compilers that support libstdc++ have attribute
> always_inline too. If not it would need to be moved
> as a macro to c++config.h with appropiate ifdefs.

That should be fine.  I assume __always_inline was chosen rather than
_GLIBCXX_ALWAYS_INLINE for consistency with glibc?
Jonathan Wakely - March 18, 2013, 4:30 p.m.
On 18 March 2013 16:28, Jonathan Wakely wrote:
> On 16 March 2013 13:29, Andi Kleen wrote:
>>
>> With inline __attribute__((always_inline)) these functions
>> get inlined even with -O0.
>>
>> I hardcoded the attribute in the header for now, assuming
>> that all compilers that support libstdc++ have attribute
>> always_inline too. If not it would need to be moved
>> as a macro to c++config.h with appropiate ifdefs.
>
> That should be fine.  I assume __always_inline was chosen rather than
> _GLIBCXX_ALWAYS_INLINE for consistency with glibc?

Ah, I see it's also being used in libitm and libatomic, not just libstdc++.
Andi Kleen - March 18, 2013, 4:34 p.m.
On Mon, Mar 18, 2013 at 04:28:13PM +0000, Jonathan Wakely wrote:
> On 16 March 2013 13:29, Andi Kleen wrote:
> >
> > With inline __attribute__((always_inline)) these functions
> > get inlined even with -O0.
> >
> > I hardcoded the attribute in the header for now, assuming
> > that all compilers that support libstdc++ have attribute
> > always_inline too. If not it would need to be moved
> > as a macro to c++config.h with appropiate ifdefs.
> 
> That should be fine.  I assume __always_inline was chosen rather than
> _GLIBCXX_ALWAYS_INLINE for consistency with glibc?

Actually with the Linux kernel, but it's pretty arbitary to be honest.

-Andi
Jakub Jelinek - March 19, 2013, 6:46 a.m.
On Mon, Mar 18, 2013 at 04:28:13PM +0000, Jonathan Wakely wrote:
> On 16 March 2013 13:29, Andi Kleen wrote:
> >
> > With inline __attribute__((always_inline)) these functions
> > get inlined even with -O0.
> >
> > I hardcoded the attribute in the header for now, assuming
> > that all compilers that support libstdc++ have attribute
> > always_inline too. If not it would need to be moved
> > as a macro to c++config.h with appropiate ifdefs.
> 
> That should be fine.  I assume __always_inline was chosen rather than
> _GLIBCXX_ALWAYS_INLINE for consistency with glibc?

Using __always_inline as the name of the macro is a bad idea, glibc
headers use that macro already.  Just use something else in public headers
that aren't part of glibc.

	Jakub
Andi Kleen - March 19, 2013, 3:51 p.m.
> Using __always_inline as the name of the macro is a bad idea, glibc
> headers use that macro already.  Just use something else in public headers
> that aren't part of glibc.

That's why I had the ifdef, but ok.  I'll use __force_inline then.

-Andi
Jakub Jelinek - March 19, 2013, 4:10 p.m.
On Tue, Mar 19, 2013 at 08:51:21AM -0700, Andi Kleen wrote:
> > Using __always_inline as the name of the macro is a bad idea, glibc
> > headers use that macro already.  Just use something else in public headers
> > that aren't part of glibc.
> 
> That's why I had the ifdef, but ok.  I'll use __force_inline then.

I'd say Jonathan's _GLIBCXX_ALWAYS_INLINE would be better.

BTW, have you verified always_inline works fine for -O0?

int x;

inline __attribute__((always_inline)) bool
test_and_set(int __m = 5)
{
  return __atomic_test_and_set (&x, __m);
}

int
foo (void)
{
  return test_and_set (65536 | 5);
}

with -O0 -mhle doesn't result in xacquire, guess for !optimize
get_memmodel would need to look through chain of SSA_NAMEs if SSA_NAME
(that can appear because of inlining), looking for INTEGER_CSTs.
  If there is:
  _7 = 0x10005;
  _8 = _7;
  _9 = _8;
  __atomic_test_and_set (&x, _9);
still return 0x10005 rather than MEMMODEL_SEQ_CST.

	Jakub
Andi Kleen - March 19, 2013, 5:30 p.m.
On Tue, Mar 19, 2013 at 05:10:22PM +0100, Jakub Jelinek wrote:
> On Tue, Mar 19, 2013 at 08:51:21AM -0700, Andi Kleen wrote:
> > > Using __always_inline as the name of the macro is a bad idea, glibc
> > > headers use that macro already.  Just use something else in public headers
> > > that aren't part of glibc.
> > 
> > That's why I had the ifdef, but ok.  I'll use __force_inline then.
> 
> I'd say Jonathan's _GLIBCXX_ALWAYS_INLINE would be better.
> 
> BTW, have you verified always_inline works fine for -O0?

I did some simple tests and it seemed to work.

But thanks for the counter example.

> 
> with -O0 -mhle doesn't result in xacquire, guess for !optimize
> get_memmodel would need to look through chain of SSA_NAMEs if SSA_NAME
> (that can appear because of inlining), looking for INTEGER_CSTs.

interesting. so need more fixes. i'll look into that. Is there 
already a convenient helper for it?

There was an alternative approach of using a template, but I suppose
that would have the same problem.


-Andi
Richard Guenther - March 20, 2013, 10:38 a.m.
On Tue, Mar 19, 2013 at 6:30 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Tue, Mar 19, 2013 at 05:10:22PM +0100, Jakub Jelinek wrote:
>> On Tue, Mar 19, 2013 at 08:51:21AM -0700, Andi Kleen wrote:
>> > > Using __always_inline as the name of the macro is a bad idea, glibc
>> > > headers use that macro already.  Just use something else in public headers
>> > > that aren't part of glibc.
>> >
>> > That's why I had the ifdef, but ok.  I'll use __force_inline then.
>>
>> I'd say Jonathan's _GLIBCXX_ALWAYS_INLINE would be better.
>>
>> BTW, have you verified always_inline works fine for -O0?
>
> I did some simple tests and it seemed to work.
>
> But thanks for the counter example.
>
>>
>> with -O0 -mhle doesn't result in xacquire, guess for !optimize
>> get_memmodel would need to look through chain of SSA_NAMEs if SSA_NAME
>> (that can appear because of inlining), looking for INTEGER_CSTs.
>
> interesting. so need more fixes. i'll look into that. Is there
> already a convenient helper for it?

Not without using information created by TER which is disabled for this
case because of different line information.  If TER would not be disabled
for this reason it would already work automagically.

Eventually we can relax this restriction for constants that are only fed
through temporaries ...

> There was an alternative approach of using a template, but I suppose
> that would have the same problem.

Possibly yes, but I guess the frontend substitutes without going through
an extra temporary (if all arguments are template parameters!)

Richard.

>
> -Andi
Jakub Jelinek - March 20, 2013, 10:46 a.m.
On Wed, Mar 20, 2013 at 11:38:03AM +0100, Richard Biener wrote:
> Not without using information created by TER which is disabled for this
> case because of different line information.  If TER would not be disabled
> for this reason it would already work automagically.

Would relaxing that in TER for constants help in the case of:
static inline __attribute__((always_inline)) ...
foo (..., int m = __ATOMIC_SEQ_CST)
{
  if (something)
    bar ();
  else
    baz ();
  __atomic_test_and_set (&x, m);
}

void
test ()
{
  foo (..., __ATOMIC_HLE_ACQUIRE | __ATOMIC_SEQ_CST);
}
though?  I'd think as the temp = 0x10005; would be in a different bb, TER
wouldn't do anything here, for -O1 of course CCP or similar would propagate
that, but for -O0 we'd still have to walk the chain of SSA_NAMEs.

	Jakub
Richard Guenther - March 20, 2013, 11:20 a.m.
On Wed, Mar 20, 2013 at 11:46 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Mar 20, 2013 at 11:38:03AM +0100, Richard Biener wrote:
>> Not without using information created by TER which is disabled for this
>> case because of different line information.  If TER would not be disabled
>> for this reason it would already work automagically.
>
> Would relaxing that in TER for constants help in the case of:
> static inline __attribute__((always_inline)) ...
> foo (..., int m = __ATOMIC_SEQ_CST)
> {
>   if (something)
>     bar ();
>   else
>     baz ();
>   __atomic_test_and_set (&x, m);
> }
>
> void
> test ()
> {
>   foo (..., __ATOMIC_HLE_ACQUIRE | __ATOMIC_SEQ_CST);
> }
> though?  I'd think as the temp = 0x10005; would be in a different bb, TER
> wouldn't do anything here, for -O1 of course CCP or similar would propagate
> that, but for -O0 we'd still have to walk the chain of SSA_NAMEs.

True.  I was for a long time thinking of running a CCP pass at -O0 ...
and only restrict what we sustitute into (builtin arguments and asm operands).

Walking the SSA def chain would also be possible - preferably from
cfgexpand.c when we process the stmt.  But you'll eventually lose
debug info like for

  int i = 1;
  __builtin_foo (i);

when we then never expand the SSA name def i_2 = 1 because we are
never visiting its use ...

Richard.

>         Jakub
Jakub Jelinek - March 20, 2013, 11:24 a.m.
On Wed, Mar 20, 2013 at 12:20:27PM +0100, Richard Biener wrote:
> Walking the SSA def chain would also be possible - preferably from
> cfgexpand.c when we process the stmt.  But you'll eventually lose
> debug info like for
> 
>   int i = 1;
>   __builtin_foo (i);
> 
> when we then never expand the SSA name def i_2 = 1 because we are
> never visiting its use ...

If i is a user variable, then for -O0 we'd better create a memory slot for
it and expand the i = 1 store as store to that memory location.

	Jakub
Richard Guenther - March 20, 2013, 11:28 a.m.
On Wed, Mar 20, 2013 at 12:24 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Mar 20, 2013 at 12:20:27PM +0100, Richard Biener wrote:
>> Walking the SSA def chain would also be possible - preferably from
>> cfgexpand.c when we process the stmt.  But you'll eventually lose
>> debug info like for
>>
>>   int i = 1;
>>   __builtin_foo (i);
>>
>> when we then never expand the SSA name def i_2 = 1 because we are
>> never visiting its use ...
>
> If i is a user variable, then for -O0 we'd better create a memory slot for
> it and expand the i = 1 store as store to that memory location.

Ok, we seem to do that...

Richard.

>         Jakub

Patch

diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index 609fe8b..475ffa0 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -37,6 +37,10 @@ 
 #include <stdint.h>
 #include <bits/atomic_lockfree_defines.h>
 
+#ifndef __always_inline
+#define __always_inline inline __attribute__((always_inline))
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -94,11 +98,11 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       | (__m & __memory_order_modifier_mask));
   }
 
-  inline void
+  __always_inline void
   atomic_thread_fence(memory_order __m) noexcept
   { __atomic_thread_fence(__m); }
 
-  inline void
+  __always_inline void
   atomic_signal_fence(memory_order __m) noexcept
   { __atomic_signal_fence(__m); }
 
@@ -281,19 +285,19 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       : __atomic_flag_base{ _S_init(__i) }
     { }
 
-    bool
+    __always_inline bool
     test_and_set(memory_order __m = memory_order_seq_cst) noexcept
     {
       return __atomic_test_and_set (&_M_i, __m);
     }
 
-    bool
+    __always_inline bool
     test_and_set(memory_order __m = memory_order_seq_cst) volatile noexcept
     {
       return __atomic_test_and_set (&_M_i, __m);
     }
 
-    void
+    __always_inline void
     clear(memory_order __m = memory_order_seq_cst) noexcept
     {
       memory_order __b = __m & __memory_order_mask;
@@ -304,7 +308,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       __atomic_clear (&_M_i, __m);
     }
 
-    void
+    __always_inline void
     clear(memory_order __m = memory_order_seq_cst) volatile noexcept
     {
       memory_order __b = __m & __memory_order_mask;
@@ -463,7 +467,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       is_lock_free() const volatile noexcept
       { return __atomic_is_lock_free(sizeof(_M_i), nullptr); }
 
-      void
+      __always_inline void
       store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
       {
         memory_order __b = __m & __memory_order_mask;
@@ -474,7 +478,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__atomic_store_n(&_M_i, __i, __m);
       }
 
-      void
+      __always_inline void
       store(__int_type __i,
 	    memory_order __m = memory_order_seq_cst) volatile noexcept
       {
@@ -486,7 +490,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__atomic_store_n(&_M_i, __i, __m);
       }
 
-      __int_type
+      __always_inline __int_type
       load(memory_order __m = memory_order_seq_cst) const noexcept
       {
        memory_order __b = __m & __memory_order_mask;
@@ -496,7 +500,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_load_n(&_M_i, __m);
       }
 
-      __int_type
+      __always_inline __int_type
       load(memory_order __m = memory_order_seq_cst) const volatile noexcept
       {
         memory_order __b = __m & __memory_order_mask;
@@ -506,7 +510,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_load_n(&_M_i, __m);
       }
 
-      __int_type
+      __always_inline __int_type
       exchange(__int_type __i,
 	       memory_order __m = memory_order_seq_cst) noexcept
       {
@@ -514,14 +518,14 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       }
 
 
-      __int_type
+      __always_inline __int_type
       exchange(__int_type __i,
 	       memory_order __m = memory_order_seq_cst) volatile noexcept
       {
 	return __atomic_exchange_n(&_M_i, __i, __m);
       }
 
-      bool
+      __always_inline bool
       compare_exchange_weak(__int_type& __i1, __int_type __i2,
 			    memory_order __m1, memory_order __m2) noexcept
       {
@@ -534,7 +538,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 1, __m1, __m2);
       }
 
-      bool
+      __always_inline bool
       compare_exchange_weak(__int_type& __i1, __int_type __i2,
 			    memory_order __m1,
 			    memory_order __m2) volatile noexcept
@@ -548,7 +552,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 1, __m1, __m2);
       }
 
-      bool
+      __always_inline bool
       compare_exchange_weak(__int_type& __i1, __int_type __i2,
 			    memory_order __m = memory_order_seq_cst) noexcept
       {
@@ -556,7 +560,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 				     __cmpexch_failure_order(__m));
       }
 
-      bool
+      __always_inline bool
       compare_exchange_weak(__int_type& __i1, __int_type __i2,
 		   memory_order __m = memory_order_seq_cst) volatile noexcept
       {
@@ -564,7 +568,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 				     __cmpexch_failure_order(__m));
       }
 
-      bool
+      __always_inline bool
       compare_exchange_strong(__int_type& __i1, __int_type __i2,
 			      memory_order __m1, memory_order __m2) noexcept
       {
@@ -577,7 +581,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 0, __m1, __m2);
       }
 
-      bool
+      __always_inline bool
       compare_exchange_strong(__int_type& __i1, __int_type __i2,
 			      memory_order __m1,
 			      memory_order __m2) volatile noexcept
@@ -592,7 +596,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 0, __m1, __m2);
       }
 
-      bool
+      __always_inline bool
       compare_exchange_strong(__int_type& __i1, __int_type __i2,
 			      memory_order __m = memory_order_seq_cst) noexcept
       {
@@ -600,7 +604,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 				       __cmpexch_failure_order(__m));
       }
 
-      bool
+      __always_inline bool
       compare_exchange_strong(__int_type& __i1, __int_type __i2,
 		 memory_order __m = memory_order_seq_cst) volatile noexcept
       {
@@ -608,52 +612,52 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 				       __cmpexch_failure_order(__m));
       }
 
-      __int_type
+      __always_inline __int_type
       fetch_add(__int_type __i,
 		memory_order __m = memory_order_seq_cst) noexcept
       { return __atomic_fetch_add(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_add(__int_type __i,
 		memory_order __m = memory_order_seq_cst) volatile noexcept
       { return __atomic_fetch_add(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_sub(__int_type __i,
 		memory_order __m = memory_order_seq_cst) noexcept
       { return __atomic_fetch_sub(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_sub(__int_type __i,
 		memory_order __m = memory_order_seq_cst) volatile noexcept
       { return __atomic_fetch_sub(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_and(__int_type __i,
 		memory_order __m = memory_order_seq_cst) noexcept
       { return __atomic_fetch_and(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_and(__int_type __i,
 		memory_order __m = memory_order_seq_cst) volatile noexcept
       { return __atomic_fetch_and(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_or(__int_type __i,
 	       memory_order __m = memory_order_seq_cst) noexcept
       { return __atomic_fetch_or(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_or(__int_type __i,
 	       memory_order __m = memory_order_seq_cst) volatile noexcept
       { return __atomic_fetch_or(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_xor(__int_type __i,
 		memory_order __m = memory_order_seq_cst) noexcept
       { return __atomic_fetch_xor(&_M_i, __i, __m); }
 
-      __int_type
+      __always_inline __int_type
       fetch_xor(__int_type __i,
 		memory_order __m = memory_order_seq_cst) volatile noexcept
       { return __atomic_fetch_xor(&_M_i, __i, __m); }
@@ -770,7 +774,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       is_lock_free() const volatile noexcept
       { return __atomic_is_lock_free(_M_type_size(1), nullptr); }
 
-      void
+      __always_inline void
       store(__pointer_type __p,
 	    memory_order __m = memory_order_seq_cst) noexcept
       {
@@ -783,7 +787,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__atomic_store_n(&_M_p, __p, __m);
       }
 
-      void
+      __always_inline void
       store(__pointer_type __p,
 	    memory_order __m = memory_order_seq_cst) volatile noexcept
       {
@@ -795,7 +799,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__atomic_store_n(&_M_p, __p, __m);
       }
 
-      __pointer_type
+      __always_inline __pointer_type
       load(memory_order __m = memory_order_seq_cst) const noexcept
       {
         memory_order __b = __m & __memory_order_mask;
@@ -805,7 +809,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_load_n(&_M_p, __m);
       }
 
-      __pointer_type
+      __always_inline __pointer_type
       load(memory_order __m = memory_order_seq_cst) const volatile noexcept
       {
         memory_order __b = __m & __memory_order_mask;
@@ -815,7 +819,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_load_n(&_M_p, __m);
       }
 
-      __pointer_type
+      __always_inline __pointer_type
       exchange(__pointer_type __p,
 	       memory_order __m = memory_order_seq_cst) noexcept
       {
@@ -823,14 +827,14 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       }
 
 
-      __pointer_type
+      __always_inline __pointer_type
       exchange(__pointer_type __p,
 	       memory_order __m = memory_order_seq_cst) volatile noexcept
       {
 	return __atomic_exchange_n(&_M_p, __p, __m);
       }
 
-      bool
+      __always_inline bool
       compare_exchange_strong(__pointer_type& __p1, __pointer_type __p2,
 			      memory_order __m1,
 			      memory_order __m2) noexcept
@@ -844,7 +848,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_compare_exchange_n(&_M_p, &__p1, __p2, 0, __m1, __m2);
       }
 
-      bool
+      __always_inline bool
       compare_exchange_strong(__pointer_type& __p1, __pointer_type __p2,
 			      memory_order __m1,
 			      memory_order __m2) volatile noexcept
@@ -859,22 +863,22 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __atomic_compare_exchange_n(&_M_p, &__p1, __p2, 0, __m1, __m2);
       }
 
-      __pointer_type
+      __always_inline __pointer_type
       fetch_add(ptrdiff_t __d,
 		memory_order __m = memory_order_seq_cst) noexcept
       { return __atomic_fetch_add(&_M_p, _M_type_size(__d), __m); }
 
-      __pointer_type
+      __always_inline __pointer_type
       fetch_add(ptrdiff_t __d,
 		memory_order __m = memory_order_seq_cst) volatile noexcept
       { return __atomic_fetch_add(&_M_p, _M_type_size(__d), __m); }
 
-      __pointer_type
+      __always_inline __pointer_type
       fetch_sub(ptrdiff_t __d,
 		memory_order __m = memory_order_seq_cst) noexcept
       { return __atomic_fetch_sub(&_M_p, _M_type_size(__d), __m); }
 
-      __pointer_type
+      __always_inline __pointer_type
       fetch_sub(ptrdiff_t __d,
 		memory_order __m = memory_order_seq_cst) volatile noexcept
       { return __atomic_fetch_sub(&_M_p, _M_type_size(__d), __m); }