diff mbox series

[committed] libstdc++: Optimize ref-count updates in COW std::string

Message ID 20211201150816.217497-1-jwakely@redhat.com
State New
Headers show
Series [committed] libstdc++: Optimize ref-count updates in COW std::string | expand

Commit Message

Jonathan Wakely Dec. 1, 2021, 3:08 p.m. UTC
Tested x86_64-linux, pushed to trunk.


Most ref-count updates in the COW string are done via the functions in
<ext/atomicity.h>, which will use non-atomic ops when the program is
known to be single-threaded. The _M_is_leaked() and _M_is_shared()
functions use __atomic_load_n directly, because <ext/atomicity.h>
doesn't provide a load operation. Those functions can check the
__is_single_threaded() predicate to avoid using __atomic_load_n when not
needed.

The move constructor for the fully-dynamic-string increments the
ref-count by either 2 or 1, for leaked or non-leaked strings
respectively. That can be changed to use a non-atomic store of 1 for all
non-shared strings. It can be non-atomic because even if the program is
multi-threaded, conflicting access to the rvalue object while it's being
moved from would be data race anyway. It can store 1 directly for all
non-shared strings because it doesn't matter whether the initial
refcount was -1 or 0, it should be 1 after the move constructor creates
a second owner.

libstdc++-v3/ChangeLog:

	* include/bits/cow_string.h (basic_string::_M_is_leaked): Use
	non-atomic load when __is_single_threaded() is true.
	(basic_string::_M_is_shared): Likewise.
	(basic_string::(basic_string&&)) [_GLIBCXX_FULLY_DYNAMIC_STRING]:
	Use non-atomic store when rvalue is not shared.
---
 libstdc++-v3/include/bits/cow_string.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Comments

Florian Weimer Dec. 1, 2021, 6:16 p.m. UTC | #1
* Jonathan Wakely via Libstdc:

> diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> index ced395b80b8..4fae1d02981 100644
> --- a/libstdc++-v3/include/bits/cow_string.h
> +++ b/libstdc++-v3/include/bits/cow_string.h
> @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     *  destroy the empty-string _Rep object.
>     *
>     *  All but the last paragraph is considered pretty conventional
> -   *  for a C++ string implementation.
> +   *  for a Copy-On-Write C++ string implementation.
>    */
>    // 21.3  Template class basic_string
>    template<typename _CharT, typename _Traits, typename _Alloc>
> @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  	  // so we need to use an atomic load. However, _M_is_leaked
>  	  // predicate does not change concurrently (i.e. the string is either
>  	  // leaked or not), so a relaxed load is enough.
> -	  return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> -#else
> -	  return this->_M_refcount < 0;
> +	  if (!__gnu_cxx::__is_single_threaded())
> +	    return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
>  #endif
> +	  return this->_M_refcount < 0;
>  	}

Relaxed MO loads of word-size values on all current architectures only
have a compiler barrier, so I think the optimization makes things worse?
(I doubt the conditional lack of a compiler barrier leads to
optimization improvements elsewhere.)

Thanks,
Florian
Jonathan Wakely Dec. 1, 2021, 6:24 p.m. UTC | #2
On Wed, 1 Dec 2021 at 18:16, Florian Weimer wrote:
>
> * Jonathan Wakely via Libstdc:
>
> > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> > index ced395b80b8..4fae1d02981 100644
> > --- a/libstdc++-v3/include/bits/cow_string.h
> > +++ b/libstdc++-v3/include/bits/cow_string.h
> > @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >     *  destroy the empty-string _Rep object.
> >     *
> >     *  All but the last paragraph is considered pretty conventional
> > -   *  for a C++ string implementation.
> > +   *  for a Copy-On-Write C++ string implementation.
> >    */
> >    // 21.3  Template class basic_string
> >    template<typename _CharT, typename _Traits, typename _Alloc>
> > @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >         // so we need to use an atomic load. However, _M_is_leaked
> >         // predicate does not change concurrently (i.e. the string is either
> >         // leaked or not), so a relaxed load is enough.
> > -       return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> > -#else
> > -       return this->_M_refcount < 0;
> > +       if (!__gnu_cxx::__is_single_threaded())
> > +         return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> >  #endif
> > +       return this->_M_refcount < 0;
> >       }
>
> Relaxed MO loads of word-size values on all current architectures only
> have a compiler barrier, so I think the optimization makes things worse?

Hmm, yes.

> (I doubt the conditional lack of a compiler barrier leads to
> optimization improvements elsewhere.)

Probably not. I'll revert the change to _M_is_leaked() and just keep
it for _M_is_shared().

Thanks for pointing that out.
diff mbox series

Patch

diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
index ced395b80b8..4fae1d02981 100644
--- a/libstdc++-v3/include/bits/cow_string.h
+++ b/libstdc++-v3/include/bits/cow_string.h
@@ -105,7 +105,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
    *  destroy the empty-string _Rep object.
    *
    *  All but the last paragraph is considered pretty conventional
-   *  for a C++ string implementation.
+   *  for a Copy-On-Write C++ string implementation.
   */
   // 21.3  Template class basic_string
   template<typename _CharT, typename _Traits, typename _Alloc>
@@ -207,10 +207,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  // so we need to use an atomic load. However, _M_is_leaked
 	  // predicate does not change concurrently (i.e. the string is either
 	  // leaked or not), so a relaxed load is enough.
-	  return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
-#else
-	  return this->_M_refcount < 0;
+	  if (!__gnu_cxx::__is_single_threaded())
+	    return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
 #endif
+	  return this->_M_refcount < 0;
 	}
 
 	bool
@@ -222,10 +222,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  // but one reference concurrently with this check, so we need this
 	  // load to be acquire to synchronize with release fetch_and_add in
 	  // _M_dispose.
-	  return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0;
-#else
-	  return this->_M_refcount > 0;
+	  if (!__gnu_cxx::__is_single_threaded())
+	    return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0;
 #endif
+	  return this->_M_refcount > 0;
 	}
 
 	void
@@ -629,12 +629,12 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
 	// Rather than allocate an empty string for the rvalue string,
 	// just share ownership with it by incrementing the reference count.
-	// If the rvalue string was "leaked" then it was the unique owner,
-	// so need an extra increment to indicate shared ownership.
-	if (_M_rep()->_M_is_leaked())
-	  __gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 2);
-	else
+	// If the rvalue string was the unique owner then there are exactly
+	// two owners now.
+	if (_M_rep()->_M_is_shared())
 	  __gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 1);
+	else
+	  _M_rep()->_M_refcount = 1;
 #endif
       }