Message ID | cda87f91-d9fd-692b-f60e-dcc92728f02e@gmail.com |
---|---|
State | New |
Headers | show |
On Mon, 15 May 2017, François Dumont wrote: > I also added some optimizations. Especially replacement of std::fill with > calls to __builtin_memset. Has anyone ever proposed to optimize std::fill in > such a way ? It would require a test on the value used to fill the range but > it might worth this additional runtime check, no ? Note that with -O3, gcc recognizes the pattern in std::fill and generates a call to memset (there is a bit too much extra code around the memset, but a couple match.pd transformations should fix that). That doesn't mean we can't save it the work. If you want to save the runtime check, there is always __builtin_constant_p... The __fill_bvector part of the fill overload for vector<bool> could do with some improvements as well. Looping is unnecessary, one just needs to produce the right mask and and or or with it, that shouldn't take more than 4 instructions or so. There was a time when I suggested overloading std::count and std::find in order to use __builtin_popcount, etc. But from what I've seen of committee discussions, I expect that there will be specialized algorithms (possibly member functions) eventually, making the overload less useful.
On 15/05/2017 21:31, Marc Glisse wrote: > On Mon, 15 May 2017, François Dumont wrote: > >> I also added some optimizations. Especially replacement of >> std::fill with calls to __builtin_memset. Has anyone ever proposed to >> optimize std::fill in such a way ? It would require a test on the >> value used to fill the range but it might worth this additional >> runtime check, no ? > > Note that with -O3, gcc recognizes the pattern in std::fill and > generates a call to memset (there is a bit too much extra code around > the memset, but a couple match.pd transformations should fix that). Good to know, at least g++ will be able to spend more time on other optimizations :-) What is match.pd ? > That doesn't mean we can't save it the work. If you want to save the > runtime check, there is always __builtin_constant_p... Good point, I will give it a try. > > The __fill_bvector part of the fill overload for vector<bool> could do > with some improvements as well. Looping is unnecessary, one just needs > to produce the right mask and and or or with it, that shouldn't take > more than 4 instructions or so. Yes, good idear, I'll submit another patch after this one. > > There was a time when I suggested overloading std::count and std::find > in order to use __builtin_popcount, etc. But from what I've seen of > committee discussions, I expect that there will be specialized > algorithms (possibly member functions) eventually, making the overload > less useful. > ok, thanks for those feedbacks. François
On Tue, 16 May 2017, François Dumont wrote:
> What is match.pd ?
It is a file in gcc where we describe simple pattern-matching
optimizations. In this case, IIRC, the missing transformations were
* ptr + n == ptr + 1 --> n == 1 (we already do it if 1 is replaced by a
variable or 0)
* ((n/8)+1)*8 --> n+8 when the division is known to be exact
On 15/05/17 19:57 +0200, François Dumont wrote: >Hi > > Following what I have started on RbTree here is a patch to default >implementation of default and move constructors on std::vector<bool>. > > As in _Rb_tree_impl the default allocator is not value initialized >anymore. We could add a small helper type arround the allocator to do >this value initialization per default. Should I do so ? It's required to be value-initialized, so if your patch changes that then it's a problem. Did we decide it's OK to do that for RB-trees? Did we actually discuss that part of the r243379 changes? N.B. defining these members as defaulted makes diagnostics worse, see PR 80858, but I think we need to fix that in the compiler anyway.
On 25/05/2017 18:28, Jonathan Wakely wrote: > On 15/05/17 19:57 +0200, François Dumont wrote: >> Hi >> >> Following what I have started on RbTree here is a patch to default >> implementation of default and move constructors on std::vector<bool>. >> >> As in _Rb_tree_impl the default allocator is not value initialized >> anymore. We could add a small helper type arround the allocator to do >> this value initialization per default. Should I do so ? > > It's required to be value-initialized, so if your patch changes that > then it's a problem. > > Did we decide it's OK to do that for RB-trees? Did we actually discuss > that part of the r243379 changes? I remember a message pointing this issue but after the commit AFAIR. I thought it was from Tim but I can't find it on the archive. What is the rational of this requirement ? I started working on a type to do the allocator value initialization if there is no default constructor but it seems quite complicated to do so. It is quite sad that we can't fully benefit from this nice C++11 feature just because of this requirement. If there is any initialization needed it doesn't sound complicated to provide a default constructor. > > > N.B. defining these members as defaulted makes diagnostics worse, see > PR 80858, but I think we need to fix that in the compiler anyway. > > Ok, I hope compiler will be able to improve this situtation. François
On 26/05/17 23:13 +0200, François Dumont wrote: >On 25/05/2017 18:28, Jonathan Wakely wrote: >>On 15/05/17 19:57 +0200, François Dumont wrote: >>>Hi >>> >>> Following what I have started on RbTree here is a patch to >>>default implementation of default and move constructors on >>>std::vector<bool>. >>> >>> As in _Rb_tree_impl the default allocator is not value >>>initialized anymore. We could add a small helper type arround the >>>allocator to do this value initialization per default. Should I do >>>so ? >> >>It's required to be value-initialized, so if your patch changes that >>then it's a problem. >> >>Did we decide it's OK to do that for RB-trees? Did we actually discuss >>that part of the r243379 changes? > >I remember a message pointing this issue but after the commit AFAIR. I >thought it was from Tim but I can't find it on the archive. > >What is the rational of this requirement ? I started working on a type >to do the allocator value initialization if there is no default >constructor but it seems quite complicated to do so. It is quite sad >that we can't fully benefit from this nice C++11 feature just because >of this requirement. If there is any initialization needed it doesn't >sound complicated to provide a default constructor. The standard says that the default constructor is: vector() : vector(Allocator()) { } That value-initializes the allocator. If the allocator type behaves differently for value-init and default-init (e.g. it has data members that are left uninitialized by default-init) then the difference matters. If you change the code so it only does default-init of the allocator then you will introduce an observable difference. I don't see any requirement that a DefaultConstructible allocator cannot leave members uninitialized, so that means the standard requires default construction of vector<bool> to value-init the allocator. Not default-init. Here's an allocator that behaves differently if value-initialized or default-initialized: template<typename T> struct Alloc { using value_type = T; Alloc() = default; template<typename U> Alloc(const Alloc<U>& a) : mem(a.mem) { } T* allocate(std::size_t n) { if (mem) throw 1; return std::allocator<T>().allocate(n); } void deallocate(T* p, std::size_t n) { std::allocator<T>().deallocate(p, n); } int mem; }; template<typename T, typename U> bool operator==(const Alloc<T>& t, const Alloc<U>& u) { return t.mem == u.mem; } template<typename T, typename U> bool operator!=(const Alloc<T>& t, const Alloc<U>& u) { return !(t == u); }
diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index 37e000a..a6afced 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -399,8 +399,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { if (__first._M_p != __last._M_p) { - std::fill(__first._M_p + 1, __last._M_p, __x ? ~0 : 0); - __fill_bvector(__first, _Bit_iterator(__first._M_p + 1, 0), __x); + _Bit_type *__first_p = __first._M_p; + if (__first._M_offset != 0) + __fill_bvector(__first, _Bit_iterator(++__first_p, 0), __x); + + __builtin_memset(__first_p, __x ? ~0 : 0, + (__last._M_p - __first_p) * sizeof(_Bit_type)); + + if (__last._M_offset != 0) __fill_bvector(_Bit_iterator(__last._M_p, 0), __last, __x); } else @@ -416,33 +422,66 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _Bit_alloc_traits; typedef typename _Bit_alloc_traits::pointer _Bit_pointer; - struct _Bvector_impl - : public _Bit_alloc_type + struct _Bvector_impl_data { _Bit_iterator _M_start; _Bit_iterator _M_finish; _Bit_pointer _M_end_of_storage; + _Bvector_impl_data() _GLIBCXX_NOEXCEPT + : _M_start(), _M_finish(), _M_end_of_storage() + { } + +#if __cplusplus >= 201103L + _Bvector_impl_data(_Bvector_impl_data&& __x) noexcept + : _M_start(__x._M_start), _M_finish(__x._M_finish) + , _M_end_of_storage(__x._M_end_of_storage) + { __x._M_reset(); } + + void + _M_move_data(_Bvector_impl_data&& __x) noexcept + { + this->_M_start = __x._M_start; + this->_M_finish = __x._M_finish; + this->_M_end_of_storage = __x._M_end_of_storage; + __x._M_reset(); + } + + void + _M_reset() noexcept + { + this->_M_start = _Bit_iterator(); + this->_M_finish = _Bit_iterator(); + this->_M_end_of_storage = nullptr; + } +#endif + }; + + struct _Bvector_impl + : public _Bit_alloc_type, public _Bvector_impl_data + { + public: +#if __cplusplus >= 201103L + _Bvector_impl() = default; +#else _Bvector_impl() - : _Bit_alloc_type(), _M_start(), _M_finish(), _M_end_of_storage() + : _Bit_alloc_type() { } +#endif _Bvector_impl(const _Bit_alloc_type& __a) - : _Bit_alloc_type(__a), _M_start(), _M_finish(), _M_end_of_storage() + : _Bit_alloc_type(__a) { } #if __cplusplus >= 201103L - _Bvector_impl(_Bit_alloc_type&& __a) - : _Bit_alloc_type(std::move(__a)), _M_start(), _M_finish(), - _M_end_of_storage() - { } + _Bvector_impl(_Bvector_impl&&) = default; #endif _Bit_type* _M_end_addr() const _GLIBCXX_NOEXCEPT { - if (_M_end_of_storage) - return std::__addressof(_M_end_of_storage[-1]) + 1; + if (this->_M_end_of_storage) + return std::__addressof(this->_M_end_of_storage[-1]) + 1; return 0; } }; @@ -462,23 +501,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER get_allocator() const _GLIBCXX_NOEXCEPT { return allocator_type(_M_get_Bit_allocator()); } +#if __cplusplus >= 201103L + _Bvector_base() = default; +#else _Bvector_base() : _M_impl() { } +#endif _Bvector_base(const allocator_type& __a) : _M_impl(__a) { } #if __cplusplus >= 201103L - _Bvector_base(_Bvector_base&& __x) noexcept - : _M_impl(std::move(__x._M_get_Bit_allocator())) - { - this->_M_impl._M_start = __x._M_impl._M_start; - this->_M_impl._M_finish = __x._M_impl._M_finish; - this->_M_impl._M_end_of_storage = __x._M_impl._M_end_of_storage; - __x._M_impl._M_start = _Bit_iterator(); - __x._M_impl._M_finish = _Bit_iterator(); - __x._M_impl._M_end_of_storage = nullptr; - } + _Bvector_base(_Bvector_base&&) = default; #endif ~_Bvector_base() @@ -505,6 +539,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } +#if __cplusplus >= 201103L + void + _M_move_data(_Bvector_base&& __x) noexcept + { _M_impl._M_move_data(std::move(__x._M_impl)); } +#endif + static size_t _S_nword(size_t __n) { return (__n + int(_S_word_bit) - 1) / int(_S_word_bit); } @@ -564,7 +604,8 @@ template<typename _Alloc> typedef std::reverse_iterator<iterator> reverse_iterator; typedef _Alloc allocator_type; - allocator_type get_allocator() const + allocator_type + get_allocator() const { return _Base::get_allocator(); } protected: @@ -574,11 +615,12 @@ template<typename _Alloc> using _Base::_M_get_Bit_allocator; public: - vector() #if __cplusplus >= 201103L - noexcept(is_nothrow_default_constructible<allocator_type>::value) -#endif + vector() = default; +#else + vector() : _Base() { } +#endif explicit vector(const allocator_type& __a) @@ -592,23 +634,16 @@ template<typename _Alloc> vector(size_type __n, const bool& __value, const allocator_type& __a = allocator_type()) - : _Base(__a) - { - _M_initialize(__n); - std::fill(this->_M_impl._M_start._M_p, this->_M_impl._M_end_addr(), - __value ? ~0 : 0); - } #else explicit vector(size_type __n, const bool& __value = bool(), const allocator_type& __a = allocator_type()) +#endif : _Base(__a) { _M_initialize(__n); - std::fill(this->_M_impl._M_start._M_p, this->_M_impl._M_end_addr(), - __value ? ~0 : 0); + _M_initialize_value(__value); } -#endif vector(const vector& __x) : _Base(_Bit_alloc_traits::_S_select_on_copy(__x._M_get_Bit_allocator())) @@ -618,22 +653,14 @@ template<typename _Alloc> } #if __cplusplus >= 201103L - vector(vector&& __x) noexcept - : _Base(std::move(__x)) { } + vector(vector&&) = default; vector(vector&& __x, const allocator_type& __a) noexcept(_Bit_alloc_traits::_S_always_equal()) : _Base(__a) { if (__x.get_allocator() == __a) - { - this->_M_impl._M_start = __x._M_impl._M_start; - this->_M_impl._M_finish = __x._M_impl._M_finish; - this->_M_impl._M_end_of_storage = __x._M_impl._M_end_of_storage; - __x._M_impl._M_start = _Bit_iterator(); - __x._M_impl._M_finish = _Bit_iterator(); - __x._M_impl._M_end_of_storage = nullptr; - } + this->_M_move_data(std::move(__x)); else { _M_initialize(__x.size()); @@ -716,12 +743,7 @@ template<typename _Alloc> || this->_M_get_Bit_allocator() == __x._M_get_Bit_allocator()) { this->_M_deallocate(); - this->_M_impl._M_start = __x._M_impl._M_start; - this->_M_impl._M_finish = __x._M_impl._M_finish; - this->_M_impl._M_end_of_storage = __x._M_impl._M_end_of_storage; - __x._M_impl._M_start = _Bit_iterator(); - __x._M_impl._M_finish = _Bit_iterator(); - __x._M_impl._M_end_of_storage = nullptr; + this->_M_move_data(std::move(__x)); std::__alloc_on_move(_M_get_Bit_allocator(), __x._M_get_Bit_allocator()); } @@ -760,7 +782,7 @@ template<typename _Alloc> typename = std::_RequireInputIter<_InputIterator>> void assign(_InputIterator __first, _InputIterator __last) - { _M_assign_dispatch(__first, __last, __false_type()); } + { _M_assign_aux(__first, __last, std::__iterator_category(__first)); } #else template<typename _InputIterator> void @@ -774,7 +796,7 @@ template<typename _Alloc> #if __cplusplus >= 201103L void assign(initializer_list<bool> __l) - { this->assign(__l.begin(), __l.end()); } + { _M_assign_aux(__l.begin(), __l.end(), random_access_iterator_tag()); } #endif iterator @@ -1096,6 +1118,14 @@ template<typename _Alloc> } void + _M_initialize_value(bool __x) + { + __builtin_memset(this->_M_impl._M_start._M_p, __x ? ~0 : 0, + (this->_M_impl._M_end_addr() - this->_M_impl._M_start._M_p) + * sizeof(_Bit_type)); + } + + void _M_reallocate(size_type __n); #if __cplusplus >= 201103L @@ -1112,8 +1142,7 @@ template<typename _Alloc> _M_initialize_dispatch(_Integer __n, _Integer __x, __true_type) { _M_initialize(static_cast<size_type>(__n)); - std::fill(this->_M_impl._M_start._M_p, - this->_M_impl._M_end_addr(), __x ? ~0 : 0); + _M_initialize_value(__x); } template<typename _InputIterator> @@ -1160,15 +1189,13 @@ template<typename _Alloc> { if (__n > size()) { - std::fill(this->_M_impl._M_start._M_p, - this->_M_impl._M_end_addr(), __x ? ~0 : 0); + _M_initialize_value(__x); insert(end(), __n - size(), __x); } else { _M_erase_at_end(begin() + __n); - std::fill(this->_M_impl._M_start._M_p, - this->_M_impl._M_end_addr(), __x ? ~0 : 0); + _M_initialize_value(__x); } }