Message ID | 20211201150816.217497-1-jwakely@redhat.com |
---|---|
State | New |
Headers | show |
Series | [committed] libstdc++: Optimize ref-count updates in COW std::string | expand |
* Jonathan Wakely via Libstdc: > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h > index ced395b80b8..4fae1d02981 100644 > --- a/libstdc++-v3/include/bits/cow_string.h > +++ b/libstdc++-v3/include/bits/cow_string.h > @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > * destroy the empty-string _Rep object. > * > * All but the last paragraph is considered pretty conventional > - * for a C++ string implementation. > + * for a Copy-On-Write C++ string implementation. > */ > // 21.3 Template class basic_string > template<typename _CharT, typename _Traits, typename _Alloc> > @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // so we need to use an atomic load. However, _M_is_leaked > // predicate does not change concurrently (i.e. the string is either > // leaked or not), so a relaxed load is enough. > - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0; > -#else > - return this->_M_refcount < 0; > + if (!__gnu_cxx::__is_single_threaded()) > + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0; > #endif > + return this->_M_refcount < 0; > } Relaxed MO loads of word-size values on all current architectures only have a compiler barrier, so I think the optimization makes things worse? (I doubt the conditional lack of a compiler barrier leads to optimization improvements elsewhere.) Thanks, Florian
On Wed, 1 Dec 2021 at 18:16, Florian Weimer wrote: > > * Jonathan Wakely via Libstdc: > > > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h > > index ced395b80b8..4fae1d02981 100644 > > --- a/libstdc++-v3/include/bits/cow_string.h > > +++ b/libstdc++-v3/include/bits/cow_string.h > > @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > * destroy the empty-string _Rep object. > > * > > * All but the last paragraph is considered pretty conventional > > - * for a C++ string implementation. > > + * for a Copy-On-Write C++ string implementation. > > */ > > // 21.3 Template class basic_string > > template<typename _CharT, typename _Traits, typename _Alloc> > > @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > // so we need to use an atomic load. However, _M_is_leaked > > // predicate does not change concurrently (i.e. the string is either > > // leaked or not), so a relaxed load is enough. > > - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0; > > -#else > > - return this->_M_refcount < 0; > > + if (!__gnu_cxx::__is_single_threaded()) > > + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0; > > #endif > > + return this->_M_refcount < 0; > > } > > Relaxed MO loads of word-size values on all current architectures only > have a compiler barrier, so I think the optimization makes things worse? Hmm, yes. > (I doubt the conditional lack of a compiler barrier leads to > optimization improvements elsewhere.) Probably not. I'll revert the change to _M_is_leaked() and just keep it for _M_is_shared(). Thanks for pointing that out.
diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h index ced395b80b8..4fae1d02981 100644 --- a/libstdc++-v3/include/bits/cow_string.h +++ b/libstdc++-v3/include/bits/cow_string.h @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * destroy the empty-string _Rep object. * * All but the last paragraph is considered pretty conventional - * for a C++ string implementation. + * for a Copy-On-Write C++ string implementation. */ // 21.3 Template class basic_string template<typename _CharT, typename _Traits, typename _Alloc> @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // so we need to use an atomic load. However, _M_is_leaked // predicate does not change concurrently (i.e. the string is either // leaked or not), so a relaxed load is enough. - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0; -#else - return this->_M_refcount < 0; + if (!__gnu_cxx::__is_single_threaded()) + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0; #endif + return this->_M_refcount < 0; } bool @@ -222,10 +222,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // but one reference concurrently with this check, so we need this // load to be acquire to synchronize with release fetch_and_add in // _M_dispose. - return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0; -#else - return this->_M_refcount > 0; + if (!__gnu_cxx::__is_single_threaded()) + return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0; #endif + return this->_M_refcount > 0; } void @@ -629,12 +629,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #else // Rather than allocate an empty string for the rvalue string, // just share ownership with it by incrementing the reference count. - // If the rvalue string was "leaked" then it was the unique owner, - // so need an extra increment to indicate shared ownership. - if (_M_rep()->_M_is_leaked()) - __gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 2); - else + // If the rvalue string was the unique owner then there are exactly + // two owners now. + if (_M_rep()->_M_is_shared()) __gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 1); + else + _M_rep()->_M_refcount = 1; #endif }