diff mbox series

[_Hashtable] Use RAII to restore Rehash state

Message ID 7f61df18-dd99-4ff5-9fcd-8ca7820403d4@gmail.com
State New
Headers show
Series [_Hashtable] Use RAII to restore Rehash state | expand

Commit Message

François Dumont Oct. 26, 2023, 5:18 a.m. UTC
libstdc++: [_Hashtable] Use RAII type to manage rehash functor state

     Replace usage of __try/__catch with a RAII type to restore rehash 
functor
     state when needed.

     libstdc++-v3/ChangeLog:

             * include/bits/hashtable_policy.h (_RehashStateGuard): New.
             (_Insert_base<>::_M_insert_range(_IIt, _IIt, const 
_NodeGet&, false_type)):
             Adapt.
             * include/bits/hashtable.h (__rehash_guard_t): New.
             (__rehash_state): Remove.
             (_M_rehash): Remove.
             (_M_rehash_aux): Rename into _M_rehash.
             (_M_assign_elements, _M_insert_unique_node, 
_M_insert_multi_node): Adapt.
             (rehash): Adapt.


Tested under Linux x64.

Ok to commit ?

François

Comments

Jonathan Wakely Oct. 26, 2023, 10:14 a.m. UTC | #1
On Thu, 26 Oct 2023 at 06:18, François Dumont <frs.dumont@gmail.com> wrote:

>      libstdc++: [_Hashtable] Use RAII type to manage rehash functor state
>
>      Replace usage of __try/__catch with a RAII type to restore rehash
> functor
>      state when needed.
>

I'm reviewing this now, but could I request that you attach patches as .txt
files in gmail please?

When you attach a .patch file gmail decides to give it content-type
text/x-patch and base64 encode it, and set content-disposition: attachment,
which mean it looks like this when received:

https://inbox.sourceware.org/libstdc++/7f61df18-dd99-4ff5-9fcd-8ca7820403d4@gmail.com/raw

It's hard to reply inline when the patch needs to be downloaded separately.

If you name the file .txt then gmail just shows it in the mail body
(conetnt-disposition: inline) and it's much easier to reply.

Anyway, I'll finish reviewing this one now that I've downloaded it and
manually pasted the patch into my reply.



>
>      libstdc++-v3/ChangeLog:
>
>              * include/bits/hashtable_policy.h (_RehashStateGuard): New.
>              (_Insert_base<>::_M_insert_range(_IIt, _IIt, const
> _NodeGet&, false_type)):
>              Adapt.
>              * include/bits/hashtable.h (__rehash_guard_t): New.
>              (__rehash_state): Remove.
>              (_M_rehash): Remove.
>              (_M_rehash_aux): Rename into _M_rehash.
>              (_M_assign_elements, _M_insert_unique_node,
> _M_insert_multi_node): Adapt.
>              (rehash): Adapt.
>
>
> Tested under Linux x64.
>
> Ok to commit ?
>
> François
>
Jonathan Wakely Oct. 26, 2023, 10:43 a.m. UTC | #2
On 26/10/23 07:18 +0200, François Dumont wrote:
>    libstdc++: [_Hashtable] Use RAII type to manage rehash functor state
>
>    Replace usage of __try/__catch with a RAII type to restore rehash 
>functor
>    state when needed.

Generally I really like replacing try-catch with RAII but I have some
questions below.

>    libstdc++-v3/ChangeLog:
>
>            * include/bits/hashtable_policy.h (_RehashStateGuard): New.
>            (_Insert_base<>::_M_insert_range(_IIt, _IIt, const 
>_NodeGet&, false_type)):
>            Adapt.
>            * include/bits/hashtable.h (__rehash_guard_t): New.
>            (__rehash_state): Remove.
>            (_M_rehash): Remove.
>            (_M_rehash_aux): Rename into _M_rehash.
>            (_M_assign_elements, _M_insert_unique_node, 
>_M_insert_multi_node): Adapt.
>            (rehash): Adapt.
>
>
>Tested under Linux x64.
>
>Ok to commit ?
>
>François

>diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
>index 0857448f7ed..64071ac1fb2 100644
>--- a/libstdc++-v3/include/bits/hashtable.h
>+++ b/libstdc++-v3/include/bits/hashtable.h
>@@ -234,6 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 					      _RehashPolicy, _Traits>;
>       using __enable_default_ctor
> 	= _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
>+      using __rehash_guard_t
>+	= __detail::_RehashStateGuard<_RehashPolicy>;
> 
>     public:
>       typedef _Key						key_type;
>@@ -264,7 +266,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 
>     private:
>       using __rehash_type = _RehashPolicy;
>-      using __rehash_state = typename __rehash_type::_State;
> 
>       using __unique_keys = typename __traits_type::__unique_keys;
> 
>@@ -1200,14 +1201,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 
>     private:
>       // Helper rehash method used when keys are unique.
>-      void _M_rehash_aux(size_type __bkt_count, true_type __uks);
>+      void _M_rehash(size_type __bkt_count, true_type __uks);
> 
>       // Helper rehash method used when keys can be non-unique.
>-      void _M_rehash_aux(size_type __bkt_count, false_type __uks);
>-
>-      // Unconditionally change size of bucket array to n, restore
>-      // hash policy state to __state on exception.
>-      void _M_rehash(size_type __bkt_count, const __rehash_state& __state);
>+      void _M_rehash(size_type __bkt_count, false_type __uks);
>     };
> 
>   // Definitions of class template _Hashtable's out-of-line member functions.
>@@ -1337,7 +1334,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       {
> 	__buckets_ptr __former_buckets = nullptr;
> 	std::size_t __former_bucket_count = _M_bucket_count;
>-	const __rehash_state& __former_state = _M_rehash_policy._M_state();
>+	__rehash_guard_t __rehash_guard(_M_rehash_policy);
> 
> 	if (_M_bucket_count != __ht._M_bucket_count)
> 	  {
>@@ -1359,6 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	    _M_assign(std::forward<_Ht>(__ht), __roan);
> 	    if (__former_buckets)
> 	      _M_deallocate_buckets(__former_buckets, __former_bucket_count);
>+	    __rehash_guard._M_reset = false;

I find this confusing. Usually "reset" means that something is
cleared, so won't take an action in the destructor. e.g. if you use
std::unique_ptr::reset() then the object is destroyed immediately, and
then nothing happens in the destructor. Here it's the opposite,
_M_reset=true means that it _sould_ do something in the destructor.

The problem is the ambiguity between "reset the state in the
destructor later" and "reset the object to an empty state now".

If the member was called _M_guarded then it might be clearer.
_M_guarded=true means the guard is active, and will restore the state
later. Or _M_active, or _M_armed, or even _M_reset_in_dtor. Any of
those names avoids the confusion with the semantics of
std::unique_ptr::reset() and similar well-known APIs.

Or what I usually do is store a pointer to the guarded object in the
RAII guard type, and then just null the pointer to disarm the guard.
That means you don't need a separate bool member variable. If
_RehashStateGuard::_M_rehash_policy was called _M_guarded_obj and was
a _RehashPolicy* instead of _RehashPolicy& then disarming it would be:

        __rehash_guard._M_guarded_obj = nullptr;

This seems clear to me, as it says that the guard no longer has
anything to guard, so won't do anything in the destructor.


> 	  }
> 	__catch(...)
> 	  {
>@@ -1366,7 +1364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	      {
> 		// Restore previous buckets.
> 		_M_deallocate_buckets();
>-		_M_rehash_policy._M_reset(__former_state);
> 		_M_buckets = __former_buckets;
> 		_M_bucket_count = __former_bucket_count;
> 	      }
>@@ -2142,17 +2139,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 			  __node_ptr __node, size_type __n_elt)
>     -> iterator
>     {
>-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
>+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>       std::pair<bool, std::size_t> __do_rehash
> 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> 					  __n_elt);
> 
>       if (__do_rehash.first)
> 	{
>-	  _M_rehash(__do_rehash.second, __saved_state);
>+	  _M_rehash(__do_rehash.second, true_type{});
> 	  __bkt = _M_bucket_index(__code);
> 	}
> 
>+      __rehash_guard._M_reset = false;
>       this->_M_store_code(*__node, __code);

This changes behaviour. Previously the try-catch was inside the call to
_M_rehash. Now we're starting to guard before calling _M_need_rehash,
and we run the destructor unconditionally even if we never call
_M_rehash. I think that's OK, because _M_need_rehash is noexcept (but
it's confusing because we have a const overload of _M_need_rehash
which is noexcept(false) and a non-const overload which is
noexcept(true) ... should they both be noexcept?)

Can we do this instead:


-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
        std::pair<bool, std::size_t> __do_rehash
  	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
  					  __n_elt);
  
        if (__do_rehash.first)
  	{
+	  __rehash_guard_t __rehash_guard(_M_rehash_policy);
-	  _M_rehash(__do_rehash.second, __saved_state);
+	  _M_rehash(__do_rehash.second, true_type{});
+	  __rehash_guard._M_reset = false;
  	  __bkt = _M_bucket_index(__code);
  	}
  
        this->_M_store_code(*__node, __code);

This only creates a guard object if we're actually going to rehash, so
we don't run its destructor (and check if we need t restore anything)
when no rehash is needed.

N.B. _M_bucket_index is also confusing. There are two overloads of it,
one is noexcept and one isn't. The one that we call here is
noexcept(false), so that call could throw ... is it OK that we don't
restore the old state if that happens? Should that be guarded too? (I
have no idea).

Also, I see that the noexcept(true) overload of _M_bucket_index calls
__hash_code_base::_M_bucket_index, which is noexcept(/* condition */),
so the caller says it cannot throw, but it calls something that
sometimes throws. Is that correct?

> 
>       // Always insert at the beginning of the bucket.
>@@ -2172,13 +2170,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 			 __hash_code __code, __node_ptr __node)
>     -> iterator
>     {
>-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
>+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>       std::pair<bool, std::size_t> __do_rehash
> 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count, 1);
> 
>       if (__do_rehash.first)
>-	_M_rehash(__do_rehash.second, __saved_state);
>+	_M_rehash(__do_rehash.second, false_type{});
> 
>+      __rehash_guard._M_reset = false;

Again, we're creating the guard object in a larger scope than needed.

>       this->_M_store_code(*__node, __code);
>       const key_type& __k = _ExtractKey{}(__node->_M_v());
>       size_type __bkt = _M_bucket_index(__code);
>@@ -2509,39 +2508,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>     rehash(size_type __bkt_count)
>     {
>-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
>+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>       __bkt_count
> 	= std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count + 1),
> 		   __bkt_count);
>       __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
> 
>       if (__bkt_count != _M_bucket_count)
>-	_M_rehash(__bkt_count, __saved_state);
>-      else
>-	// No rehash, restore previous state to keep it consistent with
>-	// container state.
>-	_M_rehash_policy._M_reset(__saved_state);
>-    }
>-
>-  template<typename _Key, typename _Value, typename _Alloc,
>-	   typename _ExtractKey, typename _Equal,
>-	   typename _Hash, typename _RangeHash, typename _Unused,
>-	   typename _RehashPolicy, typename _Traits>
>-    void
>-    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>-	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>-    _M_rehash(size_type __bkt_count, const __rehash_state& __state)
>-    {
>-      __try
>-	{
>-	  _M_rehash_aux(__bkt_count, __unique_keys{});
>-	}
>-      __catch(...)
> 	{
>-	  // A failure here means that buckets allocation failed.  We only
>-	  // have to restore hash policy previous state.
>-	  _M_rehash_policy._M_reset(__state);
>-	  __throw_exception_again;
>+	  _M_rehash(__bkt_count, __unique_keys{});
>+	  __rehash_guard._M_reset = false;

And a larger scope again.

> 	}
>     }
> 
>@@ -2553,7 +2529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     void
>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>-    _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
>+    _M_rehash(size_type __bkt_count, true_type /* __uks */)
>     {
>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
>       __node_ptr __p = _M_begin();
>@@ -2596,7 +2572,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     void
>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>-    _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
>+    _M_rehash(size_type __bkt_count, false_type /* __uks */)
>     {
>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
>       __node_ptr __p = _M_begin();
>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
>index 5d162463dc3..8b9626b1575 100644
>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>@@ -715,6 +715,26 @@ namespace __detail
>     std::size_t	_M_next_resize;
>   };
> 
>+  template<typename _RehashPolicy>
>+    struct _RehashStateGuard
>+    {
>+      _RehashPolicy& _M_rehash_policy;
>+      typename _RehashPolicy::_State _M_prev_state;
>+      bool _M_reset = true;

This could be:

+      _RehashPolicy* _M_guarded_obj;
+      typename _RehashPolicy::_State _M_prev_state;

>+
>+      _RehashStateGuard(_RehashPolicy& __policy)
>+      : _M_rehash_policy(__policy)
>+      , _M_prev_state(__policy._M_state())

+      _RehashStateGuard(_RehashPolicy& __policy)
+      : _M_guarded_obj(std::__addressof(__policy))
+      , _M_prev_state(__policy._M_state())

>+      { }
>+      _RehashStateGuard(const _RehashStateGuard&) = delete;
>+
>+      ~_RehashStateGuard()
>+      {
>+	if (_M_reset)
>+	  _M_rehash_policy._M_reset(_M_prev_state);

+	if (_M_guarded_obj)
+	  _M_guarded_obj->_M_reset(_M_prev_state);

>+      }
>+    };
>+
>   // Base classes for std::_Hashtable.  We define these base classes
>   // because in some cases we want to do different things depending on
>   // the value of a policy class.  In some cases the policy class
>@@ -1007,7 +1027,7 @@ namespace __detail
> 		      const _NodeGetter& __node_gen, false_type __uks)
>       {
> 	using __rehash_type = typename __hashtable::__rehash_type;
>-	using __rehash_state = typename __hashtable::__rehash_state;
>+	using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
> 	using pair_type = std::pair<bool, std::size_t>;
> 
> 	size_type __n_elt = __detail::__distance_fw(__first, __last);
>@@ -1016,14 +1036,15 @@ namespace __detail
> 
> 	__hashtable& __h = _M_conjure_hashtable();
> 	__rehash_type& __rehash = __h._M_rehash_policy;
>-	const __rehash_state& __saved_state = __rehash._M_state();
>+	__rehash_guard_t __rehash_guard(__rehash);
> 	pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
> 							__h._M_element_count,
> 							__n_elt);
> 
> 	if (__do_rehash.first)
>-	  __h._M_rehash(__do_rehash.second, __saved_state);
>+	  __h._M_rehash(__do_rehash.second, __uks);
> 
>+	__rehash_guard._M_reset = false;

A larger scope again.

> 	for (; __first != __last; ++__first)
> 	  __h._M_insert(*__first, __node_gen, __uks);
>       }
François Dumont Oct. 26, 2023, 8:52 p.m. UTC | #3
On 26/10/2023 12:43, Jonathan Wakely wrote:
> On 26/10/23 07:18 +0200, François Dumont wrote:
>>     libstdc++: [_Hashtable] Use RAII type to manage rehash functor state
>>
>>     Replace usage of __try/__catch with a RAII type to restore rehash 
>> functor
>>     state when needed.
>
> Generally I really like replacing try-catch with RAII but I have some
> questions below.
>
>>     libstdc++-v3/ChangeLog:
>>
>>             * include/bits/hashtable_policy.h (_RehashStateGuard): New.
>>             (_Insert_base<>::_M_insert_range(_IIt, _IIt, const 
>> _NodeGet&, false_type)):
>>             Adapt.
>>             * include/bits/hashtable.h (__rehash_guard_t): New.
>>             (__rehash_state): Remove.
>>             (_M_rehash): Remove.
>>             (_M_rehash_aux): Rename into _M_rehash.
>>             (_M_assign_elements, _M_insert_unique_node, 
>> _M_insert_multi_node): Adapt.
>>             (rehash): Adapt.
>>
>>
>> Tested under Linux x64.
>>
>> Ok to commit ?
>>
>> François
>
>> diff --git a/libstdc++-v3/include/bits/hashtable.h 
>> b/libstdc++-v3/include/bits/hashtable.h
>> index 0857448f7ed..64071ac1fb2 100644
>> --- a/libstdc++-v3/include/bits/hashtable.h
>> +++ b/libstdc++-v3/include/bits/hashtable.h
>> @@ -234,6 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                           _RehashPolicy, _Traits>;
>>       using __enable_default_ctor
>>     = _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
>> +      using __rehash_guard_t
>> +    = __detail::_RehashStateGuard<_RehashPolicy>;
>>
>>     public:
>>       typedef _Key                        key_type;
>> @@ -264,7 +266,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>     private:
>>       using __rehash_type = _RehashPolicy;
>> -      using __rehash_state = typename __rehash_type::_State;
>>
>>       using __unique_keys = typename __traits_type::__unique_keys;
>>
>> @@ -1200,14 +1201,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>     private:
>>       // Helper rehash method used when keys are unique.
>> -      void _M_rehash_aux(size_type __bkt_count, true_type __uks);
>> +      void _M_rehash(size_type __bkt_count, true_type __uks);
>>
>>       // Helper rehash method used when keys can be non-unique.
>> -      void _M_rehash_aux(size_type __bkt_count, false_type __uks);
>> -
>> -      // Unconditionally change size of bucket array to n, restore
>> -      // hash policy state to __state on exception.
>> -      void _M_rehash(size_type __bkt_count, const __rehash_state& 
>> __state);
>> +      void _M_rehash(size_type __bkt_count, false_type __uks);
>>     };
>>
>>   // Definitions of class template _Hashtable's out-of-line member 
>> functions.
>> @@ -1337,7 +1334,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>       {
>>     __buckets_ptr __former_buckets = nullptr;
>>     std::size_t __former_bucket_count = _M_bucket_count;
>> -    const __rehash_state& __former_state = _M_rehash_policy._M_state();
>> +    __rehash_guard_t __rehash_guard(_M_rehash_policy);
>>
>>     if (_M_bucket_count != __ht._M_bucket_count)
>>       {
>> @@ -1359,6 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>         _M_assign(std::forward<_Ht>(__ht), __roan);
>>         if (__former_buckets)
>>           _M_deallocate_buckets(__former_buckets, 
>> __former_bucket_count);
>> +        __rehash_guard._M_reset = false;
>
> I find this confusing. Usually "reset" means that something is
> cleared, so won't take an action in the destructor. e.g. if you use
> std::unique_ptr::reset() then the object is destroyed immediately, and
> then nothing happens in the destructor. Here it's the opposite,
> _M_reset=true means that it _sould_ do something in the destructor.
>
> The problem is the ambiguity between "reset the state in the
> destructor later" and "reset the object to an empty state now".
>
> If the member was called _M_guarded then it might be clearer.
> _M_guarded=true means the guard is active, and will restore the state
> later. Or _M_active, or _M_armed, or even _M_reset_in_dtor. Any of
> those names avoids the confusion with the semantics of
> std::unique_ptr::reset() and similar well-known APIs.
>
> Or what I usually do is store a pointer to the guarded object in the
> RAII guard type, and then just null the pointer to disarm the guard.
> That means you don't need a separate bool member variable. If
> _RehashStateGuard::_M_rehash_policy was called _M_guarded_obj and was
> a _RehashPolicy* instead of _RehashPolicy& then disarming it would be:
>
>        __rehash_guard._M_guarded_obj = nullptr;
>
> This seems clear to me, as it says that the guard no longer has
> anything to guard, so won't do anything in the destructor.
>
Looks clearer to me too.

I started with a _RehashStateScope and a _M_commit() method, it could 
have been worst :-)


>
>>       }
>>     __catch(...)
>>       {
>> @@ -1366,7 +1364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>           {
>>         // Restore previous buckets.
>>         _M_deallocate_buckets();
>> -        _M_rehash_policy._M_reset(__former_state);
>>         _M_buckets = __former_buckets;
>>         _M_bucket_count = __former_bucket_count;
>>           }
>> @@ -2142,17 +2139,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>               __node_ptr __node, size_type __n_elt)
>>     -> iterator
>>     {
>> -      const __rehash_state& __saved_state = 
>> _M_rehash_policy._M_state();
>> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>>       std::pair<bool, std::size_t> __do_rehash
>>     = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
>>                       __n_elt);
>>
>>       if (__do_rehash.first)
>>     {
>> -      _M_rehash(__do_rehash.second, __saved_state);
>> +      _M_rehash(__do_rehash.second, true_type{});
>>       __bkt = _M_bucket_index(__code);
>>     }
>>
>> +      __rehash_guard._M_reset = false;
>>       this->_M_store_code(*__node, __code);
>
> This changes behaviour. Previously the try-catch was inside the call to
> _M_rehash. Now we're starting to guard before calling _M_need_rehash,

We were capturing _M_rehash_policy state before the call to 
_M_need_rehash so it's not different.

We really need to instantiate the guard before we call _M_need_rehash 
cause it is the method changing the _M_rehash_policy state.

> and we run the destructor unconditionally even if we never call
> _M_rehash.

Here we only want to reset rehash state on exception.

Note that with the 2 rehash policy implementations we provide it doesn't 
matter if we reset or not when we do not call _M_rehash because their 
state changes only when rehash is needed. But some user's fancy rehash 
policy might update their state even if no rehash is requested and in 
this case it is better to reset this state only on exception.


> I think that's OK, because _M_need_rehash is noexcept (but
> it's confusing because we have a const overload of _M_need_rehash
> which is noexcept(false) and a non-const overload which is
> noexcept(true) ... should they both be noexcept?)

That's another subject but yes, _M_need_rehash should be non-const and 
noexcept on both rehash policy implementations. I considered doing such 
a thing but won't it impact abi as _Prime_rehash_policy._M_need_rehash 
symbols is coming from the lib ?


>
> Can we do this instead:
>
>
> -      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
>        std::pair<bool, std::size_t> __do_rehash
>      = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
>                        __n_elt);
>
>        if (__do_rehash.first)
>      {
> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);

Clearly no, we need to capture state before _M_need_rehash. Like it was 
done with the __saved_state above.


> -      _M_rehash(__do_rehash.second, __saved_state);
> +      _M_rehash(__do_rehash.second, true_type{});
> +      __rehash_guard._M_reset = false;
>        __bkt = _M_bucket_index(__code);
>      }
>
>        this->_M_store_code(*__node, __code);
>
> This only creates a guard object if we're actually going to rehash, so
> we don't run its destructor (and check if we need t restore anything)
> when no rehash is needed.
>
> N.B. _M_bucket_index is also confusing. There are two overloads of it,
> one is noexcept and one isn't. The one that we call here is
> noexcept(false), so that call could throw ... is it OK that we don't
> restore the old state if that happens? Should that be guarded too? (I
> have no idea).
>
> Also, I see that the noexcept(true) overload of _M_bucket_index calls
> __hash_code_base::_M_bucket_index, which is noexcept(/* condition */),
> so the caller says it cannot throw, but it calls something that
> sometimes throws. Is that correct?

It can be noexcept because usage of hash code cache depends on either 
hash functor is noexcept or not.

>
>>
>>       // Always insert at the beginning of the bucket.
>> @@ -2172,13 +2170,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>              __hash_code __code, __node_ptr __node)
>>     -> iterator
>>     {
>> -      const __rehash_state& __saved_state = 
>> _M_rehash_policy._M_state();
>> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>>       std::pair<bool, std::size_t> __do_rehash
>>     = _M_rehash_policy._M_need_rehash(_M_bucket_count, 
>> _M_element_count, 1);
>>
>>       if (__do_rehash.first)
>> -    _M_rehash(__do_rehash.second, __saved_state);
>> +    _M_rehash(__do_rehash.second, false_type{});
>>
>> +      __rehash_guard._M_reset = false;
>
> Again, we're creating the guard object in a larger scope than needed.
>
>>       this->_M_store_code(*__node, __code);
>>       const key_type& __k = _ExtractKey{}(__node->_M_v());
>>       size_type __bkt = _M_bucket_index(__code);
>> @@ -2509,39 +2508,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>>     rehash(size_type __bkt_count)
>>     {
>> -      const __rehash_state& __saved_state = 
>> _M_rehash_policy._M_state();
>> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>>       __bkt_count
>>     = std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count 
>> + 1),
>>            __bkt_count);
>>       __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
>>
>>       if (__bkt_count != _M_bucket_count)
>> -    _M_rehash(__bkt_count, __saved_state);
>> -      else
>> -    // No rehash, restore previous state to keep it consistent with
>> -    // container state.
>> -    _M_rehash_policy._M_reset(__saved_state);
>> -    }
>> -
>> -  template<typename _Key, typename _Value, typename _Alloc,
>> -       typename _ExtractKey, typename _Equal,
>> -       typename _Hash, typename _RangeHash, typename _Unused,
>> -       typename _RehashPolicy, typename _Traits>
>> -    void
>> -    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>> -           _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>> -    _M_rehash(size_type __bkt_count, const __rehash_state& __state)
>> -    {
>> -      __try
>> -    {
>> -      _M_rehash_aux(__bkt_count, __unique_keys{});
>> -    }
>> -      __catch(...)
>>     {
>> -      // A failure here means that buckets allocation failed. We only
>> -      // have to restore hash policy previous state.
>> -      _M_rehash_policy._M_reset(__state);
>> -      __throw_exception_again;
>> +      _M_rehash(__bkt_count, __unique_keys{});
>> +      __rehash_guard._M_reset = false;
>
> And a larger scope again.
>
>>     }
>>     }
>>
>> @@ -2553,7 +2529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>     void
>>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>> -    _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
>> +    _M_rehash(size_type __bkt_count, true_type /* __uks */)
>>     {
>>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
>>       __node_ptr __p = _M_begin();
>> @@ -2596,7 +2572,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>     void
>>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>> -    _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
>> +    _M_rehash(size_type __bkt_count, false_type /* __uks */)
>>     {
>>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
>>       __node_ptr __p = _M_begin();
>> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>> b/libstdc++-v3/include/bits/hashtable_policy.h
>> index 5d162463dc3..8b9626b1575 100644
>> --- a/libstdc++-v3/include/bits/hashtable_policy.h
>> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
>> @@ -715,6 +715,26 @@ namespace __detail
>>     std::size_t    _M_next_resize;
>>   };
>>
>> +  template<typename _RehashPolicy>
>> +    struct _RehashStateGuard
>> +    {
>> +      _RehashPolicy& _M_rehash_policy;
>> +      typename _RehashPolicy::_State _M_prev_state;
>> +      bool _M_reset = true;
>
> This could be:
>
> +      _RehashPolicy* _M_guarded_obj;
> +      typename _RehashPolicy::_State _M_prev_state;
>
>> +
>> +      _RehashStateGuard(_RehashPolicy& __policy)
>> +      : _M_rehash_policy(__policy)
>> +      , _M_prev_state(__policy._M_state())
>
> +      _RehashStateGuard(_RehashPolicy& __policy)
> +      : _M_guarded_obj(std::__addressof(__policy))
> +      , _M_prev_state(__policy._M_state())
>
>> +      { }
>> +      _RehashStateGuard(const _RehashStateGuard&) = delete;
>> +
>> +      ~_RehashStateGuard()
>> +      {
>> +    if (_M_reset)
>> +      _M_rehash_policy._M_reset(_M_prev_state);
>
> +    if (_M_guarded_obj)
> +      _M_guarded_obj->_M_reset(_M_prev_state);
>
>> +      }
>> +    };
>> +
>>   // Base classes for std::_Hashtable.  We define these base classes
>>   // because in some cases we want to do different things depending on
>>   // the value of a policy class.  In some cases the policy class
>> @@ -1007,7 +1027,7 @@ namespace __detail
>>               const _NodeGetter& __node_gen, false_type __uks)
>>       {
>>     using __rehash_type = typename __hashtable::__rehash_type;
>> -    using __rehash_state = typename __hashtable::__rehash_state;
>> +    using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
>>     using pair_type = std::pair<bool, std::size_t>;
>>
>>     size_type __n_elt = __detail::__distance_fw(__first, __last);
>> @@ -1016,14 +1036,15 @@ namespace __detail
>>
>>     __hashtable& __h = _M_conjure_hashtable();
>>     __rehash_type& __rehash = __h._M_rehash_policy;
>> -    const __rehash_state& __saved_state = __rehash._M_state();
>> +    __rehash_guard_t __rehash_guard(__rehash);
>>     pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
>>                             __h._M_element_count,
>>                             __n_elt);
>>
>>     if (__do_rehash.first)
>> -      __h._M_rehash(__do_rehash.second, __saved_state);
>> +      __h._M_rehash(__do_rehash.second, __uks);
>>
>> +    __rehash_guard._M_reset = false;
>
> A larger scope again.

Note that there is only one place where we reset rehash policy state 
even if no exception took place. It is in the std rehash method when the 
user requests a rehash that has no effect.

Here is the updated patch, ok to commit ?
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 0857448f7ed..89132430f3e 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -234,6 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 					      _RehashPolicy, _Traits>;
       using __enable_default_ctor
 	= _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
+      using __rehash_guard_t
+	= __detail::_RehashStateGuard<_RehashPolicy>;
 
     public:
       typedef _Key						key_type;
@@ -264,7 +266,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     private:
       using __rehash_type = _RehashPolicy;
-      using __rehash_state = typename __rehash_type::_State;
 
       using __unique_keys = typename __traits_type::__unique_keys;
 
@@ -1200,14 +1201,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     private:
       // Helper rehash method used when keys are unique.
-      void _M_rehash_aux(size_type __bkt_count, true_type __uks);
+      void _M_rehash(size_type __bkt_count, true_type __uks);
 
       // Helper rehash method used when keys can be non-unique.
-      void _M_rehash_aux(size_type __bkt_count, false_type __uks);
-
-      // Unconditionally change size of bucket array to n, restore
-      // hash policy state to __state on exception.
-      void _M_rehash(size_type __bkt_count, const __rehash_state& __state);
+      void _M_rehash(size_type __bkt_count, false_type __uks);
     };
 
   // Definitions of class template _Hashtable's out-of-line member functions.
@@ -1337,7 +1334,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	__buckets_ptr __former_buckets = nullptr;
 	std::size_t __former_bucket_count = _M_bucket_count;
-	const __rehash_state& __former_state = _M_rehash_policy._M_state();
+	__rehash_guard_t __rehash_guard(_M_rehash_policy);
 
 	if (_M_bucket_count != __ht._M_bucket_count)
 	  {
@@ -1359,6 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    _M_assign(std::forward<_Ht>(__ht), __roan);
 	    if (__former_buckets)
 	      _M_deallocate_buckets(__former_buckets, __former_bucket_count);
+	    __rehash_guard._M_guarded_obj = nullptr;
 	  }
 	__catch(...)
 	  {
@@ -1366,7 +1364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      {
 		// Restore previous buckets.
 		_M_deallocate_buckets();
-		_M_rehash_policy._M_reset(__former_state);
 		_M_buckets = __former_buckets;
 		_M_bucket_count = __former_bucket_count;
 	      }
@@ -2142,17 +2139,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 			  __node_ptr __node, size_type __n_elt)
     -> iterator
     {
-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
       std::pair<bool, std::size_t> __do_rehash
 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
 					  __n_elt);
 
       if (__do_rehash.first)
 	{
-	  _M_rehash(__do_rehash.second, __saved_state);
+	  _M_rehash(__do_rehash.second, true_type{});
 	  __bkt = _M_bucket_index(__code);
 	}
 
+      __rehash_guard._M_guarded_obj = nullptr;
       this->_M_store_code(*__node, __code);
 
       // Always insert at the beginning of the bucket.
@@ -2172,13 +2170,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 			 __hash_code __code, __node_ptr __node)
     -> iterator
     {
-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
       std::pair<bool, std::size_t> __do_rehash
 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count, 1);
 
       if (__do_rehash.first)
-	_M_rehash(__do_rehash.second, __saved_state);
+	_M_rehash(__do_rehash.second, false_type{});
 
+      __rehash_guard._M_guarded_obj = nullptr;
       this->_M_store_code(*__node, __code);
       const key_type& __k = _ExtractKey{}(__node->_M_v());
       size_type __bkt = _M_bucket_index(__code);
@@ -2509,39 +2508,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
     rehash(size_type __bkt_count)
     {
-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
       __bkt_count
 	= std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count + 1),
 		   __bkt_count);
       __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
 
       if (__bkt_count != _M_bucket_count)
-	_M_rehash(__bkt_count, __saved_state);
-      else
-	// No rehash, restore previous state to keep it consistent with
-	// container state.
-	_M_rehash_policy._M_reset(__saved_state);
-    }
-
-  template<typename _Key, typename _Value, typename _Alloc,
-	   typename _ExtractKey, typename _Equal,
-	   typename _Hash, typename _RangeHash, typename _Unused,
-	   typename _RehashPolicy, typename _Traits>
-    void
-    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_rehash(size_type __bkt_count, const __rehash_state& __state)
-    {
-      __try
-	{
-	  _M_rehash_aux(__bkt_count, __unique_keys{});
-	}
-      __catch(...)
 	{
-	  // A failure here means that buckets allocation failed.  We only
-	  // have to restore hash policy previous state.
-	  _M_rehash_policy._M_reset(__state);
-	  __throw_exception_again;
+	  _M_rehash(__bkt_count, __unique_keys{});
+	  __rehash_guard._M_guarded_obj = nullptr;
 	}
     }
 
@@ -2553,7 +2529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     void
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
+    _M_rehash(size_type __bkt_count, true_type /* __uks */)
     {
       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
       __node_ptr __p = _M_begin();
@@ -2596,7 +2572,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     void
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
+    _M_rehash(size_type __bkt_count, false_type /* __uks */)
     {
       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
       __node_ptr __p = _M_begin();
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 5d162463dc3..5b97b24e156 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -715,6 +715,25 @@ namespace __detail
     std::size_t	_M_next_resize;
   };
 
+  template<typename _RehashPolicy>
+    struct _RehashStateGuard
+    {
+      _RehashPolicy* _M_guarded_obj;
+      typename _RehashPolicy::_State _M_prev_state;
+
+      _RehashStateGuard(_RehashPolicy& __policy)
+      : _M_guarded_obj(std::__addressof(__policy))
+      , _M_prev_state(__policy._M_state())
+      { }
+      _RehashStateGuard(const _RehashStateGuard&) = delete;
+
+      ~_RehashStateGuard()
+      {
+	if (_M_guarded_obj)
+	  _M_guarded_obj->_M_reset(_M_prev_state);
+      }
+    };
+
   // Base classes for std::_Hashtable.  We define these base classes
   // because in some cases we want to do different things depending on
   // the value of a policy class.  In some cases the policy class
@@ -1006,24 +1025,24 @@ namespace __detail
       _M_insert_range(_InputIterator __first, _InputIterator __last,
 		      const _NodeGetter& __node_gen, false_type __uks)
       {
-	using __rehash_type = typename __hashtable::__rehash_type;
-	using __rehash_state = typename __hashtable::__rehash_state;
-	using pair_type = std::pair<bool, std::size_t>;
+	using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
+	using __pair_type = std::pair<bool, std::size_t>;
 
 	size_type __n_elt = __detail::__distance_fw(__first, __last);
 	if (__n_elt == 0)
 	  return;
 
 	__hashtable& __h = _M_conjure_hashtable();
-	__rehash_type& __rehash = __h._M_rehash_policy;
-	const __rehash_state& __saved_state = __rehash._M_state();
-	pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
-							__h._M_element_count,
-							__n_elt);
+	__rehash_guard_t __rehash_guard(__h._M_rehash_policy);
+	__pair_type __do_rehash
+	  = __h._M_rehash_policy._M_need_rehash(__h._M_bucket_count,
+						__h._M_element_count,
+						__n_elt);
 
 	if (__do_rehash.first)
-	  __h._M_rehash(__do_rehash.second, __saved_state);
+	  __h._M_rehash(__do_rehash.second, __uks);
 
+	__rehash_guard._M_guarded_obj = nullptr;
 	for (; __first != __last; ++__first)
 	  __h._M_insert(*__first, __node_gen, __uks);
       }
Jonathan Wakely Nov. 9, 2023, 8:11 a.m. UTC | #4
On Thu, 26 Oct 2023 at 21:52, François Dumont <frs.dumont@gmail.com> wrote:
>
>
> On 26/10/2023 12:43, Jonathan Wakely wrote:
> > On 26/10/23 07:18 +0200, François Dumont wrote:
> >>     libstdc++: [_Hashtable] Use RAII type to manage rehash functor state
> >>
> >>     Replace usage of __try/__catch with a RAII type to restore rehash
> >> functor
> >>     state when needed.
> >
> > Generally I really like replacing try-catch with RAII but I have some
> > questions below.
> >
> >>     libstdc++-v3/ChangeLog:
> >>
> >>             * include/bits/hashtable_policy.h (_RehashStateGuard): New.
> >>             (_Insert_base<>::_M_insert_range(_IIt, _IIt, const
> >> _NodeGet&, false_type)):
> >>             Adapt.
> >>             * include/bits/hashtable.h (__rehash_guard_t): New.
> >>             (__rehash_state): Remove.
> >>             (_M_rehash): Remove.
> >>             (_M_rehash_aux): Rename into _M_rehash.
> >>             (_M_assign_elements, _M_insert_unique_node,
> >> _M_insert_multi_node): Adapt.
> >>             (rehash): Adapt.
> >>
> >>
> >> Tested under Linux x64.
> >>
> >> Ok to commit ?
> >>
> >> François
> >
> >> diff --git a/libstdc++-v3/include/bits/hashtable.h
> >> b/libstdc++-v3/include/bits/hashtable.h
> >> index 0857448f7ed..64071ac1fb2 100644
> >> --- a/libstdc++-v3/include/bits/hashtable.h
> >> +++ b/libstdc++-v3/include/bits/hashtable.h
> >> @@ -234,6 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>                           _RehashPolicy, _Traits>;
> >>       using __enable_default_ctor
> >>     = _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
> >> +      using __rehash_guard_t
> >> +    = __detail::_RehashStateGuard<_RehashPolicy>;
> >>
> >>     public:
> >>       typedef _Key                        key_type;
> >> @@ -264,7 +266,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >>     private:
> >>       using __rehash_type = _RehashPolicy;
> >> -      using __rehash_state = typename __rehash_type::_State;
> >>
> >>       using __unique_keys = typename __traits_type::__unique_keys;
> >>
> >> @@ -1200,14 +1201,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >>     private:
> >>       // Helper rehash method used when keys are unique.
> >> -      void _M_rehash_aux(size_type __bkt_count, true_type __uks);
> >> +      void _M_rehash(size_type __bkt_count, true_type __uks);
> >>
> >>       // Helper rehash method used when keys can be non-unique.
> >> -      void _M_rehash_aux(size_type __bkt_count, false_type __uks);
> >> -
> >> -      // Unconditionally change size of bucket array to n, restore
> >> -      // hash policy state to __state on exception.
> >> -      void _M_rehash(size_type __bkt_count, const __rehash_state&
> >> __state);
> >> +      void _M_rehash(size_type __bkt_count, false_type __uks);
> >>     };
> >>
> >>   // Definitions of class template _Hashtable's out-of-line member
> >> functions.
> >> @@ -1337,7 +1334,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>       {
> >>     __buckets_ptr __former_buckets = nullptr;
> >>     std::size_t __former_bucket_count = _M_bucket_count;
> >> -    const __rehash_state& __former_state = _M_rehash_policy._M_state();
> >> +    __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>
> >>     if (_M_bucket_count != __ht._M_bucket_count)
> >>       {
> >> @@ -1359,6 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>         _M_assign(std::forward<_Ht>(__ht), __roan);
> >>         if (__former_buckets)
> >>           _M_deallocate_buckets(__former_buckets,
> >> __former_bucket_count);
> >> +        __rehash_guard._M_reset = false;
> >
> > I find this confusing. Usually "reset" means that something is
> > cleared, so won't take an action in the destructor. e.g. if you use
> > std::unique_ptr::reset() then the object is destroyed immediately, and
> > then nothing happens in the destructor. Here it's the opposite,
> > _M_reset=true means that it _sould_ do something in the destructor.
> >
> > The problem is the ambiguity between "reset the state in the
> > destructor later" and "reset the object to an empty state now".
> >
> > If the member was called _M_guarded then it might be clearer.
> > _M_guarded=true means the guard is active, and will restore the state
> > later. Or _M_active, or _M_armed, or even _M_reset_in_dtor. Any of
> > those names avoids the confusion with the semantics of
> > std::unique_ptr::reset() and similar well-known APIs.
> >
> > Or what I usually do is store a pointer to the guarded object in the
> > RAII guard type, and then just null the pointer to disarm the guard.
> > That means you don't need a separate bool member variable. If
> > _RehashStateGuard::_M_rehash_policy was called _M_guarded_obj and was
> > a _RehashPolicy* instead of _RehashPolicy& then disarming it would be:
> >
> >        __rehash_guard._M_guarded_obj = nullptr;
> >
> > This seems clear to me, as it says that the guard no longer has
> > anything to guard, so won't do anything in the destructor.
> >
> Looks clearer to me too.
>
> I started with a _RehashStateScope and a _M_commit() method, it could
> have been worst :-)
>
>
> >
> >>       }
> >>     __catch(...)
> >>       {
> >> @@ -1366,7 +1364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>           {
> >>         // Restore previous buckets.
> >>         _M_deallocate_buckets();
> >> -        _M_rehash_policy._M_reset(__former_state);
> >>         _M_buckets = __former_buckets;
> >>         _M_bucket_count = __former_bucket_count;
> >>           }
> >> @@ -2142,17 +2139,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>               __node_ptr __node, size_type __n_elt)
> >>     -> iterator
> >>     {
> >> -      const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>       std::pair<bool, std::size_t> __do_rehash
> >>     = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> >>                       __n_elt);
> >>
> >>       if (__do_rehash.first)
> >>     {
> >> -      _M_rehash(__do_rehash.second, __saved_state);
> >> +      _M_rehash(__do_rehash.second, true_type{});
> >>       __bkt = _M_bucket_index(__code);
> >>     }
> >>
> >> +      __rehash_guard._M_reset = false;
> >>       this->_M_store_code(*__node, __code);
> >
> > This changes behaviour. Previously the try-catch was inside the call to
> > _M_rehash. Now we're starting to guard before calling _M_need_rehash,
>
> We were capturing _M_rehash_policy state before the call to
> _M_need_rehash so it's not different.
>
> We really need to instantiate the guard before we call _M_need_rehash
> cause it is the method changing the _M_rehash_policy state.
>
> > and we run the destructor unconditionally even if we never call
> > _M_rehash.
>
> Here we only want to reset rehash state on exception.
>
> Note that with the 2 rehash policy implementations we provide it doesn't
> matter if we reset or not when we do not call _M_rehash because their
> state changes only when rehash is needed. But some user's fancy rehash
> policy might update their state even if no rehash is requested and in
> this case it is better to reset this state only on exception.
>
>
> > I think that's OK, because _M_need_rehash is noexcept (but
> > it's confusing because we have a const overload of _M_need_rehash
> > which is noexcept(false) and a non-const overload which is
> > noexcept(true) ... should they both be noexcept?)
>
> That's another subject but yes, _M_need_rehash should be non-const and
> noexcept on both rehash policy implementations. I considered doing such
> a thing but won't it impact abi as _Prime_rehash_policy._M_need_rehash
> symbols is coming from the lib ?
>
>
> >
> > Can we do this instead:
> >
> >
> > -      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
> >        std::pair<bool, std::size_t> __do_rehash
> >      = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> >                        __n_elt);
> >
> >        if (__do_rehash.first)
> >      {
> > +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>
> Clearly no, we need to capture state before _M_need_rehash. Like it was
> done with the __saved_state above.
>
>
> > -      _M_rehash(__do_rehash.second, __saved_state);
> > +      _M_rehash(__do_rehash.second, true_type{});
> > +      __rehash_guard._M_reset = false;
> >        __bkt = _M_bucket_index(__code);
> >      }
> >
> >        this->_M_store_code(*__node, __code);
> >
> > This only creates a guard object if we're actually going to rehash, so
> > we don't run its destructor (and check if we need t restore anything)
> > when no rehash is needed.
> >
> > N.B. _M_bucket_index is also confusing. There are two overloads of it,
> > one is noexcept and one isn't. The one that we call here is
> > noexcept(false), so that call could throw ... is it OK that we don't
> > restore the old state if that happens? Should that be guarded too? (I
> > have no idea).
> >
> > Also, I see that the noexcept(true) overload of _M_bucket_index calls
> > __hash_code_base::_M_bucket_index, which is noexcept(/* condition */),
> > so the caller says it cannot throw, but it calls something that
> > sometimes throws. Is that correct?
>
> It can be noexcept because usage of hash code cache depends on either
> hash functor is noexcept or not.
>
> >
> >>
> >>       // Always insert at the beginning of the bucket.
> >> @@ -2172,13 +2170,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>              __hash_code __code, __node_ptr __node)
> >>     -> iterator
> >>     {
> >> -      const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>       std::pair<bool, std::size_t> __do_rehash
> >>     = _M_rehash_policy._M_need_rehash(_M_bucket_count,
> >> _M_element_count, 1);
> >>
> >>       if (__do_rehash.first)
> >> -    _M_rehash(__do_rehash.second, __saved_state);
> >> +    _M_rehash(__do_rehash.second, false_type{});
> >>
> >> +      __rehash_guard._M_reset = false;
> >
> > Again, we're creating the guard object in a larger scope than needed.
> >
> >>       this->_M_store_code(*__node, __code);
> >>       const key_type& __k = _ExtractKey{}(__node->_M_v());
> >>       size_type __bkt = _M_bucket_index(__code);
> >> @@ -2509,39 +2508,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >>     rehash(size_type __bkt_count)
> >>     {
> >> -      const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> +      __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>       __bkt_count
> >>     = std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count
> >> + 1),
> >>            __bkt_count);
> >>       __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
> >>
> >>       if (__bkt_count != _M_bucket_count)
> >> -    _M_rehash(__bkt_count, __saved_state);
> >> -      else
> >> -    // No rehash, restore previous state to keep it consistent with
> >> -    // container state.
> >> -    _M_rehash_policy._M_reset(__saved_state);
> >> -    }
> >> -
> >> -  template<typename _Key, typename _Value, typename _Alloc,
> >> -       typename _ExtractKey, typename _Equal,
> >> -       typename _Hash, typename _RangeHash, typename _Unused,
> >> -       typename _RehashPolicy, typename _Traits>
> >> -    void
> >> -    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >> -           _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> -    _M_rehash(size_type __bkt_count, const __rehash_state& __state)
> >> -    {
> >> -      __try
> >> -    {
> >> -      _M_rehash_aux(__bkt_count, __unique_keys{});
> >> -    }
> >> -      __catch(...)
> >>     {
> >> -      // A failure here means that buckets allocation failed. We only
> >> -      // have to restore hash policy previous state.
> >> -      _M_rehash_policy._M_reset(__state);
> >> -      __throw_exception_again;
> >> +      _M_rehash(__bkt_count, __unique_keys{});
> >> +      __rehash_guard._M_reset = false;
> >
> > And a larger scope again.
> >
> >>     }
> >>     }
> >>
> >> @@ -2553,7 +2529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>     void
> >>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> -    _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
> >> +    _M_rehash(size_type __bkt_count, true_type /* __uks */)
> >>     {
> >>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
> >>       __node_ptr __p = _M_begin();
> >> @@ -2596,7 +2572,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>     void
> >>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >>            _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> -    _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
> >> +    _M_rehash(size_type __bkt_count, false_type /* __uks */)
> >>     {
> >>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
> >>       __node_ptr __p = _M_begin();
> >> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h
> >> b/libstdc++-v3/include/bits/hashtable_policy.h
> >> index 5d162463dc3..8b9626b1575 100644
> >> --- a/libstdc++-v3/include/bits/hashtable_policy.h
> >> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
> >> @@ -715,6 +715,26 @@ namespace __detail
> >>     std::size_t    _M_next_resize;
> >>   };
> >>
> >> +  template<typename _RehashPolicy>
> >> +    struct _RehashStateGuard
> >> +    {
> >> +      _RehashPolicy& _M_rehash_policy;
> >> +      typename _RehashPolicy::_State _M_prev_state;
> >> +      bool _M_reset = true;
> >
> > This could be:
> >
> > +      _RehashPolicy* _M_guarded_obj;
> > +      typename _RehashPolicy::_State _M_prev_state;
> >
> >> +
> >> +      _RehashStateGuard(_RehashPolicy& __policy)
> >> +      : _M_rehash_policy(__policy)
> >> +      , _M_prev_state(__policy._M_state())
> >
> > +      _RehashStateGuard(_RehashPolicy& __policy)
> > +      : _M_guarded_obj(std::__addressof(__policy))
> > +      , _M_prev_state(__policy._M_state())
> >
> >> +      { }
> >> +      _RehashStateGuard(const _RehashStateGuard&) = delete;
> >> +
> >> +      ~_RehashStateGuard()
> >> +      {
> >> +    if (_M_reset)
> >> +      _M_rehash_policy._M_reset(_M_prev_state);
> >
> > +    if (_M_guarded_obj)
> > +      _M_guarded_obj->_M_reset(_M_prev_state);
> >
> >> +      }
> >> +    };
> >> +
> >>   // Base classes for std::_Hashtable.  We define these base classes
> >>   // because in some cases we want to do different things depending on
> >>   // the value of a policy class.  In some cases the policy class
> >> @@ -1007,7 +1027,7 @@ namespace __detail
> >>               const _NodeGetter& __node_gen, false_type __uks)
> >>       {
> >>     using __rehash_type = typename __hashtable::__rehash_type;
> >> -    using __rehash_state = typename __hashtable::__rehash_state;
> >> +    using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
> >>     using pair_type = std::pair<bool, std::size_t>;
> >>
> >>     size_type __n_elt = __detail::__distance_fw(__first, __last);
> >> @@ -1016,14 +1036,15 @@ namespace __detail
> >>
> >>     __hashtable& __h = _M_conjure_hashtable();
> >>     __rehash_type& __rehash = __h._M_rehash_policy;
> >> -    const __rehash_state& __saved_state = __rehash._M_state();
> >> +    __rehash_guard_t __rehash_guard(__rehash);
> >>     pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
> >>                             __h._M_element_count,
> >>                             __n_elt);
> >>
> >>     if (__do_rehash.first)
> >> -      __h._M_rehash(__do_rehash.second, __saved_state);
> >> +      __h._M_rehash(__do_rehash.second, __uks);
> >>
> >> +    __rehash_guard._M_reset = false;
> >
> > A larger scope again.
>
> Note that there is only one place where we reset rehash policy state
> even if no exception took place. It is in the std rehash method when the
> user requests a rehash that has no effect.
>
> Here is the updated patch, ok to commit ?

Thanks for the explanations, OK for trunk.
diff mbox series

Patch

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 0857448f7ed..64071ac1fb2 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -234,6 +234,8 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 					      _RehashPolicy, _Traits>;
       using __enable_default_ctor
 	= _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
+      using __rehash_guard_t
+	= __detail::_RehashStateGuard<_RehashPolicy>;
 
     public:
       typedef _Key						key_type;
@@ -264,7 +266,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     private:
       using __rehash_type = _RehashPolicy;
-      using __rehash_state = typename __rehash_type::_State;
 
       using __unique_keys = typename __traits_type::__unique_keys;
 
@@ -1200,14 +1201,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     private:
       // Helper rehash method used when keys are unique.
-      void _M_rehash_aux(size_type __bkt_count, true_type __uks);
+      void _M_rehash(size_type __bkt_count, true_type __uks);
 
       // Helper rehash method used when keys can be non-unique.
-      void _M_rehash_aux(size_type __bkt_count, false_type __uks);
-
-      // Unconditionally change size of bucket array to n, restore
-      // hash policy state to __state on exception.
-      void _M_rehash(size_type __bkt_count, const __rehash_state& __state);
+      void _M_rehash(size_type __bkt_count, false_type __uks);
     };
 
   // Definitions of class template _Hashtable's out-of-line member functions.
@@ -1337,7 +1334,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	__buckets_ptr __former_buckets = nullptr;
 	std::size_t __former_bucket_count = _M_bucket_count;
-	const __rehash_state& __former_state = _M_rehash_policy._M_state();
+	__rehash_guard_t __rehash_guard(_M_rehash_policy);
 
 	if (_M_bucket_count != __ht._M_bucket_count)
 	  {
@@ -1359,6 +1356,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    _M_assign(std::forward<_Ht>(__ht), __roan);
 	    if (__former_buckets)
 	      _M_deallocate_buckets(__former_buckets, __former_bucket_count);
+	    __rehash_guard._M_reset = false;
 	  }
 	__catch(...)
 	  {
@@ -1366,7 +1364,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      {
 		// Restore previous buckets.
 		_M_deallocate_buckets();
-		_M_rehash_policy._M_reset(__former_state);
 		_M_buckets = __former_buckets;
 		_M_bucket_count = __former_bucket_count;
 	      }
@@ -2142,17 +2139,18 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 			  __node_ptr __node, size_type __n_elt)
     -> iterator
     {
-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
       std::pair<bool, std::size_t> __do_rehash
 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
 					  __n_elt);
 
       if (__do_rehash.first)
 	{
-	  _M_rehash(__do_rehash.second, __saved_state);
+	  _M_rehash(__do_rehash.second, true_type{});
 	  __bkt = _M_bucket_index(__code);
 	}
 
+      __rehash_guard._M_reset = false;
       this->_M_store_code(*__node, __code);
 
       // Always insert at the beginning of the bucket.
@@ -2172,13 +2170,14 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 			 __hash_code __code, __node_ptr __node)
     -> iterator
     {
-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
       std::pair<bool, std::size_t> __do_rehash
 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count, 1);
 
       if (__do_rehash.first)
-	_M_rehash(__do_rehash.second, __saved_state);
+	_M_rehash(__do_rehash.second, false_type{});
 
+      __rehash_guard._M_reset = false;
       this->_M_store_code(*__node, __code);
       const key_type& __k = _ExtractKey{}(__node->_M_v());
       size_type __bkt = _M_bucket_index(__code);
@@ -2509,39 +2508,16 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
     rehash(size_type __bkt_count)
     {
-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
       __bkt_count
 	= std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count + 1),
 		   __bkt_count);
       __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
 
       if (__bkt_count != _M_bucket_count)
-	_M_rehash(__bkt_count, __saved_state);
-      else
-	// No rehash, restore previous state to keep it consistent with
-	// container state.
-	_M_rehash_policy._M_reset(__saved_state);
-    }
-
-  template<typename _Key, typename _Value, typename _Alloc,
-	   typename _ExtractKey, typename _Equal,
-	   typename _Hash, typename _RangeHash, typename _Unused,
-	   typename _RehashPolicy, typename _Traits>
-    void
-    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_rehash(size_type __bkt_count, const __rehash_state& __state)
-    {
-      __try
-	{
-	  _M_rehash_aux(__bkt_count, __unique_keys{});
-	}
-      __catch(...)
 	{
-	  // A failure here means that buckets allocation failed.  We only
-	  // have to restore hash policy previous state.
-	  _M_rehash_policy._M_reset(__state);
-	  __throw_exception_again;
+	  _M_rehash(__bkt_count, __unique_keys{});
+	  __rehash_guard._M_reset = false;
 	}
     }
 
@@ -2553,7 +2529,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     void
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
+    _M_rehash(size_type __bkt_count, true_type /* __uks */)
     {
       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
       __node_ptr __p = _M_begin();
@@ -2596,7 +2572,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     void
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
+    _M_rehash(size_type __bkt_count, false_type /* __uks */)
     {
       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
       __node_ptr __p = _M_begin();
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 5d162463dc3..8b9626b1575 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -715,6 +715,26 @@  namespace __detail
     std::size_t	_M_next_resize;
   };
 
+  template<typename _RehashPolicy>
+    struct _RehashStateGuard
+    {
+      _RehashPolicy& _M_rehash_policy;
+      typename _RehashPolicy::_State _M_prev_state;
+      bool _M_reset = true;
+
+      _RehashStateGuard(_RehashPolicy& __policy)
+      : _M_rehash_policy(__policy)
+      , _M_prev_state(__policy._M_state())
+      { }
+      _RehashStateGuard(const _RehashStateGuard&) = delete;
+
+      ~_RehashStateGuard()
+      {
+	if (_M_reset)
+	  _M_rehash_policy._M_reset(_M_prev_state);
+      }
+    };
+
   // Base classes for std::_Hashtable.  We define these base classes
   // because in some cases we want to do different things depending on
   // the value of a policy class.  In some cases the policy class
@@ -1007,7 +1027,7 @@  namespace __detail
 		      const _NodeGetter& __node_gen, false_type __uks)
       {
 	using __rehash_type = typename __hashtable::__rehash_type;
-	using __rehash_state = typename __hashtable::__rehash_state;
+	using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
 	using pair_type = std::pair<bool, std::size_t>;
 
 	size_type __n_elt = __detail::__distance_fw(__first, __last);
@@ -1016,14 +1036,15 @@  namespace __detail
 
 	__hashtable& __h = _M_conjure_hashtable();
 	__rehash_type& __rehash = __h._M_rehash_policy;
-	const __rehash_state& __saved_state = __rehash._M_state();
+	__rehash_guard_t __rehash_guard(__rehash);
 	pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
 							__h._M_element_count,
 							__n_elt);
 
 	if (__do_rehash.first)
-	  __h._M_rehash(__do_rehash.second, __saved_state);
+	  __h._M_rehash(__do_rehash.second, __uks);
 
+	__rehash_guard._M_reset = false;
 	for (; __first != __last; ++__first)
 	  __h._M_insert(*__first, __node_gen, __uks);
       }