Review Hashtable extract node API

Message ID	0a927e9b-97d5-5fc8-fb36-2af5848e017b@gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-502329-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=Y3F67wkBZoGAaxezDePgE3XN1UYSb5NnE5mr+WuZt0HkB8MqR6 okikdyDuHGph+02wiMKQwNy1eF9DnNKYDNuw0hP3nIhHNogbyG9heWTqkk/dIZEO B8rHDYSzmdym82bWWxC5bfhXl+mHjcuDsMHZUVhP9PP58E01A8J/DjMbQ= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org To: "libstdc++@gcc.gnu.org" <libstdc++@gcc.gnu.org>, gcc-patches <gcc-patches@gcc.gnu.org> From: =?utf-8?q?Fran=C3=A7ois_Dumont?= <frs.dumont@gmail.com> Subject: Review Hashtable extract node API Message-ID: <0a927e9b-97d5-5fc8-fb36-2af5848e017b@gmail.com> Date: Tue, 4 Jun 2019 19:19:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------34BA1FE9B037A781958857A5"
Series	Review Hashtable extract node API \| expand Review Hashtable extract node API

François Dumont June 4, 2019, 5:19 p.m. UTC

Hi

     Here is a patch to enhance the _Hashtable extract node API and fix 
a FIXME request.

     The enhancement to the extract node Api is that extract(const 
key_type&) do not call extract(const_iterator) anymore. Doing so we had 
to loop again through bucket nodes to find the previous node to the one 
to extract. Even if a bucket shall not contain many nodes (in unique key 
mode) it's easy to avoid it.

     To fix the FIXME I introduced a node smart pointer type managing 
the node lifetime. The node is extracted from this smart pointer only 
when there can't be any exception raised. In the context of the node 
extract api the node handle is considered as a smart pointer. So the 
node handle will remain owner of the node in case of exception when 
reinserting it, I hope it is the expected behavior.

     * include/bits/hashtable_policy.h
     (struct _NodeSmartPointer<_NodeAlloc>): New.
     (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
     (_Map_base<>::operator[](key_type&&)): Likewise.
     * include/bits/hashtable.h
     (_Hashtable<>::__node_sp_t): New.
     (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
     __node_type*, size_type)): Replace by...
(_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
     size_type, __hash_code, const _NodeAccessor&, size_type)): ...that.
     (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
     __node_type*)): Replace by...
(_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
     __hash_code, const _NodeAccessor&)): ...that.
     (_Hashtable<>::_M_reinsert_node): Adapt.
     (_Hashtable<>::_M_reinsert_node_multi): Adapt.
     (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
     (_Hashtable<>::extract(const_iterator)): Use latter.
     (_Hashtable<>::extract(const _Key&)): Likewise.
     (_Hashtable<>::_M_merge_unique): Adapt.
     (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
     (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
     _Args&&...)): Adapt.

Tested under Linux x86_64.

Ok to commit ?

François

Jonathan Wakely June 5, 2019, 4:22 p.m. UTC | #1

On 04/06/19 19:19 +0200, François Dumont wrote:
>Hi
>
>    Here is a patch to enhance the _Hashtable extract node API and fix 
>a FIXME request.
>
>    The enhancement to the extract node Api is that extract(const 
>key_type&) do not call extract(const_iterator) anymore. Doing so we 
>had to loop again through bucket nodes to find the previous node to 
>the one to extract. Even if a bucket shall not contain many nodes (in 
>unique key mode) it's easy to avoid it.

Nice.

>    To fix the FIXME I introduced a node smart pointer type managing 
>the node lifetime. The node is extracted from this smart pointer only 
>when there can't be any exception raised. In the context of the node 
>extract api the node handle is considered as a smart pointer. So the 
>node handle will remain owner of the node in case of exception when 
>reinserting it, I hope it is the expected behavior.

Yes, that's right.

I was going to suggest just using the node handle type instead of
inventing a new smart pointer, but the handle type uses std::optional
so isn't available for C++11/14.


>    * include/bits/hashtable_policy.h
>    (struct _NodeSmartPointer<_NodeAlloc>): New.
>    (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
>    (_Map_base<>::operator[](key_type&&)): Likewise.
>    * include/bits/hashtable.h
>    (_Hashtable<>::__node_sp_t): New.
>    (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
>    __node_type*, size_type)): Replace by...
>(_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
>    size_type, __hash_code, const _NodeAccessor&, size_type)): ...that.
>    (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
>    __node_type*)): Replace by...
>(_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
>    __hash_code, const _NodeAccessor&)): ...that.
>    (_Hashtable<>::_M_reinsert_node): Adapt.
>    (_Hashtable<>::_M_reinsert_node_multi): Adapt.
>    (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
>    (_Hashtable<>::extract(const_iterator)): Use latter.
>    (_Hashtable<>::extract(const _Key&)): Likewise.
>    (_Hashtable<>::_M_merge_unique): Adapt.
>    (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
>    (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
>    _Args&&...)): Adapt.
>
>Tested under Linux x86_64.
>
>Ok to commit ?
>
>François
>

>diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
>index e2e3f016a35..307865b96bf 100644
>--- a/libstdc++-v3/include/bits/hashtable.h
>+++ b/libstdc++-v3/include/bits/hashtable.h
>@@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       using __hash_cached = typename __traits_type::__hash_cached;
>       using __node_type = __detail::_Hash_node<_Value, __hash_cached::value>;
>       using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>;
>+      using __node_sp_t = __detail::_NodeSmartPointer<__node_alloc_type>;
> 
>       using __hashtable_alloc = __detail::_Hashtable_alloc<__node_alloc_type>;
> 
>@@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       __node_base*
>       _M_get_previous_node(size_type __bkt, __node_base* __n);
> 
>-      // Insert node with hash code __code, in bucket bkt if no rehash (assumes
>-      // no element with its key already present). Take ownership of the node,
>-      // deallocate it on exception.
>+      // Insert node with key __k and hash code __code, in bucket __bkt if no
>+      // rehash (assumes no element with its key already present).
>+      template<typename _NodeAccessor>
> 	iterator
>-      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>-			    __node_type* __n, size_type __n_elt = 1);
>+	_M_insert_unique_node(const key_type& __k, size_type __bkt,
>+			      __hash_code __code, const _NodeAccessor&,
>+			      size_type __n_elt = 1);
> 
>-      // Insert node with hash code __code. Take ownership of the node,
>-      // deallocate it on exception.
>+      // Insert node with hash code __code.
>+      template<typename _NodeAccessor>
> 	iterator
>-      _M_insert_multi_node(__node_type* __hint,
>-			   __hash_code __code, __node_type* __n);
>+	_M_insert_multi_node(__node_type* __hint, __hash_code __code,
>+			     const _NodeAccessor& __node_accessor);

It looks like most times you call these functions you pass an
identical lambda expression, but each of those lambda expressions will
create a unique type. That means you create different instantiations
of the function templates even though they do exactly the same thing.

That's just generating multiple copies of identical code. Passing in a
function object to provide the node pointer doesn't really seem
necessary anyway, so if it results in larger executables it's really
not desirable.

The attached patch still just passes in a node pointer (which the
function takes ownership of, unless it throws). Because the semantics
of _M_insert_multi_node change, we need the symbol name to change as
well (so old code linking to the old symbol doesn't find the new
definition with different semantics). Passing in the key makes it a
different symbol (in almost all cases the caller has the key available
already anyway).

An alternative design would be for _M_insert_(unique|multi)_node to
take a reference to the RAII type and set its _M_node member to null
after taking ownership of that pointer. That would be more explicit
about the ownership transfer, however it would require changes to the
_M_reinsert_node and _M_reinsert_node_multi functions which don't use
the RAII type (because the __node_type* is already owned by a node
handle).

> 
>       template<typename... _Args>
> 	std::pair<iterator, bool>
>@@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	      }
> 	    else
> 	      {
>-		__ret.position
>-		  = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
>-		__nh._M_ptr = nullptr;
>+		__ret.position = _M_insert_unique_node(__k, __bkt, __code,
>+				[&__nh]()
>+				{ return std::exchange(__nh._M_ptr, nullptr); });

The existing code here can just be changed to pass in __k, but still
set __nh._M_ptr to null after the call returns.

i.e.

		__ret.position
		  = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
		__nh._M_ptr = nullptr;


> 		__ret.inserted = true;
> 	      }
> 	  }
>@@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       iterator
>       _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh)
>       {
>-	iterator __ret;
> 	if (__nh.empty())
>-	  __ret = end();
>-	else
>-	  {
>+	  return end();
>+
> 	__glibcxx_assert(get_allocator() == __nh.get_allocator());
> 
> 	auto __code = this->_M_hash_code(__nh._M_key());
>-	    auto __node = std::exchange(__nh._M_ptr, nullptr);
>-	    // FIXME: this deallocates the node on exception.
>-	    __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>-	  }
>-	return __ret;
>+	return _M_insert_multi_node(__hint._M_cur, __code,
>+			      [&__nh]()
>+			      { return std::exchange(__nh._M_ptr, nullptr); });

Now that the call won't deallocate anything, we can fix the FIXME by
just delaying resetting __nh._M_ptr until the call has returned:

        auto __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
        __nh._M_ptr = nullptr;
        return __ret;


>       }
> 
>+    private:
>       /// Extract a node.

We don't want a Doxygen "///" comment on a private function, just "//"
is OK. In this case I don't think we need any comment, the
_M_extract_node name is clear enough. Saying "extract a node" adds
nothing.


>       node_type
>-      extract(const_iterator __pos)
>+      _M_extract_node(size_t __bkt, __node_base* __prev_n)
>       {
>-	__node_type* __n = __pos._M_cur;
>-	size_t __bkt = _M_bucket_index(__n);
>-
>-	// Look for previous node to unlink it from the erased one, this
>-	// is why we need buckets to contain the before begin to make
>-	// this search fast.
>-	__node_base* __prev_n = _M_get_previous_node(__bkt, __n);
>-
>+	__node_type* __n = static_cast<__node_type*>(__prev_n->_M_nxt);
> 	if (__prev_n == _M_buckets[__bkt])
> 	  _M_remove_bucket_begin(__bkt, __n->_M_next(),
> 	     __n->_M_nxt ? _M_bucket_index(__n->_M_next()) : 0);
>@@ -861,14 +853,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	return { __n, this->_M_node_allocator() };
>       }
> 
>+    public:
>+      /// Extract a node.
>+      node_type
>+      extract(const_iterator __pos)
>+      {
>+	size_t __bkt = _M_bucket_index(__pos._M_cur);
>+	return _M_extract_node(__bkt, _M_get_previous_node(__bkt, __pos._M_cur));
>+      }
>+
>       /// Extract a node.
>       node_type
>       extract(const _Key& __k)
>       {
> 	node_type __nh;
>-	auto __pos = find(__k);
>-	if (__pos != end())
>-	  __nh = extract(const_iterator(__pos));
>+	__hash_code __code = this->_M_hash_code(__k);
>+	std::size_t __bkt = _M_bucket_index(__k, __code);
>+	if (__node_base* __prev_node = _M_find_before_node(__bkt, __k, __code))
>+	  __nh = _M_extract_node(__bkt, __prev_node);
> 	return __nh;
>       }

Looks good.

>@@ -885,14 +887,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	  for (auto __i = __src.begin(), __end = __src.end(); __i != __end;)
> 	    {
> 	      auto __pos = __i++;
>-	      const key_type& __k = this->_M_extract()(__pos._M_cur->_M_v());
>+	      const key_type& __k = this->_M_extract()(*__pos);
> 	      __hash_code __code = this->_M_hash_code(__k);
> 	      size_type __bkt = _M_bucket_index(__k, __code);
> 	      if (_M_find_node(__bkt, __k, __code) == nullptr)
> 		{
> 		  auto __nh = __src.extract(__pos);
>-		  _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
>-		  __nh._M_ptr = nullptr;
>+		  _M_insert_unique_node(__k, __bkt, __code,
>+				[&__nh]()
>+				{ return std::exchange(__nh._M_ptr, nullptr); },
>+				__n_elt);

Again, the old code seems fine. Just pass in __k.

> 		  __n_elt = 1;
> 		}
> 	      else if (__n_elt != 1)
>@@ -1634,31 +1638,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       -> pair<iterator, bool>
>       {
> 	// First build the node to get access to the hash code
>-	__node_type* __node
>-	  = this->_M_allocate_node(std::forward<_Args>(__args)...);
>-	const key_type& __k = this->_M_extract()(__node->_M_v());
>-	__hash_code __code;
>-	__try
>-	  {
>-	    __code = this->_M_hash_code(__k);
>-	  }
>-	__catch(...)
>-	  {
>-	    this->_M_deallocate_node(__node);

Aside: It's confusing that _M_allocate_node actually allocates a node
and constructs an element, and _M_deallocate_node destroys the element
and deallocates the node (and we have _M_deallocate_node_ptr which
just does the deallocation part). We should add comments explaining
that.


>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
>index 878154ae210..ddcff0ec9d0 100644
>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>@@ -170,6 +170,40 @@ namespace __detail
>       __hashtable_alloc& _M_h;
>     };
> 
>+  // Node smart pointer to benefit from RAII.
>+  template<typename _NodeAlloc>
>+    struct _NodeSmartPointer
>+    {
>+    private:
>+      using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
>+      using __node_type = typename __hashtable_alloc::__node_type;
>+
>+    public:
>+      _NodeSmartPointer(__hashtable_alloc& __h, __node_type* __node) noexcept
>+      : _M_h(__h), _M_node(__node) { }
>+
>+      ~_NodeSmartPointer()
>+      {
>+	if (_M_node)
>+	  _M_h._M_deallocate_node(_M_node);
>+      }
>+
>+      __node_type*
>+      _M_get() const { return _M_node; }
>+
>+      __node_type*
>+      _M_release() noexcept
>+      {
>+	__node_type* __tmp = _M_node;
>+	_M_node = nullptr;
>+	return __tmp;
>+      }
>+
>+    private:
>+      __hashtable_alloc& _M_h;
>+      __node_type* _M_node;
>+    };

This can be defined as a member of _Hashtable, and can be much
simpler:

      // Simple RAII type for managing a node containing an element
      struct _Scoped_node
      {
	_Scoped_node(__hashtable_alloc* __h, __node_type* __n)
	: _M_h(__h), _M_node(__n) { }

	~_Scoped_node() { if (_M_node) _M_h->_M_deallocate_node(_M_node); };

	_Scoped_node(const _Scoped_node&) = delete;
	_Scoped_node& operator=(const _Scoped_node&) = delete;

	__hashtable_alloc* _M_h;
	__node_type* _M_node;
      };

I've attached a patch implementing these suggested changes. I think
it's simpler this way, what do you think?

(This patch is generated with 'git diff -b' so whitespace changes
aren't shown, so this shouldn't be applied directly. I can send
another version with the whitespace changes that is suitable to
commit).

Jonathan Wakely June 5, 2019, 4:43 p.m. UTC | #2

On 05/06/19 17:22 +0100, Jonathan Wakely wrote:
>On 04/06/19 19:19 +0200, François Dumont wrote:
>>@@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>      __node_base*
>>      _M_get_previous_node(size_type __bkt, __node_base* __n);
>>
>>-      // Insert node with hash code __code, in bucket bkt if no rehash (assumes
>>-      // no element with its key already present). Take ownership of the node,
>>-      // deallocate it on exception.
>>+      // Insert node with key __k and hash code __code, in bucket __bkt if no
>>+      // rehash (assumes no element with its key already present).
>>+      template<typename _NodeAccessor>
>>	iterator
>>-      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>>-			    __node_type* __n, size_type __n_elt = 1);
>>+	_M_insert_unique_node(const key_type& __k, size_type __bkt,
>>+			      __hash_code __code, const _NodeAccessor&,
>>+			      size_type __n_elt = 1);
>>
>>-      // Insert node with hash code __code. Take ownership of the node,
>>-      // deallocate it on exception.
>>+      // Insert node with hash code __code.
>>+      template<typename _NodeAccessor>
>>	iterator
>>-      _M_insert_multi_node(__node_type* __hint,
>>-			   __hash_code __code, __node_type* __n);
>>+	_M_insert_multi_node(__node_type* __hint, __hash_code __code,
>>+			     const _NodeAccessor& __node_accessor);
>
>It looks like most times you call these functions you pass an
>identical lambda expression, but each of those lambda expressions will
>create a unique type. That means you create different instantiations
>of the function templates even though they do exactly the same thing.
>
>That's just generating multiple copies of identical code. Passing in a
>function object to provide the node pointer doesn't really seem
>necessary anyway, so if it results in larger executables it's really
>not desirable.

Also I didn't really like the name NodeAccessor. It's not an accessor,
because it performs ownership transfer. Invoking __node_accessor()
returns a __node_type* by releasing it from the previous owner (by
setting the owner's pointer member to null).

Passing a const reference to something called NodeAccessor does not
make it clear that it performs a mutating operation like that! If the
_M_insert_unique_node and _M_insert_multi_node functions did the
__node_accessor() call *before* rehashing, and rehashing threw an
exception, then they would leak. So it's important that the
__node_acessor() call happens at the right time, and so it's important
to name it well.

In my suggested patch the naming isn't misleading, because we just
pass a raw __node_type* and have a new comment saying:

      // Takes ownership of __n if insertion succeeds, throws otherwise.

The function doesn't have a callable with non-local effects that
modifies an object outside the function. Because the caller sets the
previous owner's pointer to null there's no danger of it happening at
the wrong time; it can only happen after the function has returned and
ownership transfer has completed.

Jonathan Wakely June 5, 2019, 7:18 p.m. UTC | #3

On 05/06/19 17:43 +0100, Jonathan Wakely wrote:
>On 05/06/19 17:22 +0100, Jonathan Wakely wrote:
>>On 04/06/19 19:19 +0200, François Dumont wrote:
>>>@@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>     __node_base*
>>>     _M_get_previous_node(size_type __bkt, __node_base* __n);
>>>
>>>-      // Insert node with hash code __code, in bucket bkt if no rehash (assumes
>>>-      // no element with its key already present). Take ownership of the node,
>>>-      // deallocate it on exception.
>>>+      // Insert node with key __k and hash code __code, in bucket __bkt if no
>>>+      // rehash (assumes no element with its key already present).
>>>+      template<typename _NodeAccessor>
>>>	iterator
>>>-      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>>>-			    __node_type* __n, size_type __n_elt = 1);
>>>+	_M_insert_unique_node(const key_type& __k, size_type __bkt,
>>>+			      __hash_code __code, const _NodeAccessor&,
>>>+			      size_type __n_elt = 1);
>>>
>>>-      // Insert node with hash code __code. Take ownership of the node,
>>>-      // deallocate it on exception.
>>>+      // Insert node with hash code __code.
>>>+      template<typename _NodeAccessor>
>>>	iterator
>>>-      _M_insert_multi_node(__node_type* __hint,
>>>-			   __hash_code __code, __node_type* __n);
>>>+	_M_insert_multi_node(__node_type* __hint, __hash_code __code,
>>>+			     const _NodeAccessor& __node_accessor);
>>
>>It looks like most times you call these functions you pass an
>>identical lambda expression, but each of those lambda expressions will
>>create a unique type. That means you create different instantiations
>>of the function templates even though they do exactly the same thing.
>>
>>That's just generating multiple copies of identical code. Passing in a
>>function object to provide the node pointer doesn't really seem
>>necessary anyway, so if it results in larger executables it's really
>>not desirable.
>
>Also I didn't really like the name NodeAccessor. It's not an accessor,
>because it performs ownership transfer. Invoking __node_accessor()
>returns a __node_type* by releasing it from the previous owner (by
>setting the owner's pointer member to null).
>
>Passing a const reference to something called NodeAccessor does not
>make it clear that it performs a mutating operation like that! If the
>_M_insert_unique_node and _M_insert_multi_node functions did the
>__node_accessor() call *before* rehashing, and rehashing threw an
>exception, then they would leak. So it's important that the
>__node_acessor() call happens at the right time, and so it's important
>to name it well.
>
>In my suggested patch the naming isn't misleading, because we just
>pass a raw __node_type* and have a new comment saying:
>
>     // Takes ownership of __n if insertion succeeds, throws otherwise.
>
>The function doesn't have a callable with non-local effects that
>modifies an object outside the function. Because the caller sets the
>previous owner's pointer to null there's no danger of it happening at
>the wrong time; it can only happen after the function has returned and
>ownership transfer has completed.

As a further evolution that simplifies some uses of _Scoped_node we
could give it a constructor that allocates a node and constructs an
element, as in the attached patch.

Jonathan Wakely June 6, 2019, 11:41 a.m. UTC | #4

On 05/06/19 20:18 +0100, Jonathan Wakely wrote:
>On 05/06/19 17:43 +0100, Jonathan Wakely wrote:
>>On 05/06/19 17:22 +0100, Jonathan Wakely wrote:
>>>On 04/06/19 19:19 +0200, François Dumont wrote:
>>>>@@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>    __node_base*
>>>>    _M_get_previous_node(size_type __bkt, __node_base* __n);
>>>>
>>>>-      // Insert node with hash code __code, in bucket bkt if no rehash (assumes
>>>>-      // no element with its key already present). Take ownership of the node,
>>>>-      // deallocate it on exception.
>>>>+      // Insert node with key __k and hash code __code, in bucket __bkt if no
>>>>+      // rehash (assumes no element with its key already present).
>>>>+      template<typename _NodeAccessor>
>>>>	iterator
>>>>-      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>>>>-			    __node_type* __n, size_type __n_elt = 1);
>>>>+	_M_insert_unique_node(const key_type& __k, size_type __bkt,
>>>>+			      __hash_code __code, const _NodeAccessor&,
>>>>+			      size_type __n_elt = 1);
>>>>
>>>>-      // Insert node with hash code __code. Take ownership of the node,
>>>>-      // deallocate it on exception.
>>>>+      // Insert node with hash code __code.
>>>>+      template<typename _NodeAccessor>
>>>>	iterator
>>>>-      _M_insert_multi_node(__node_type* __hint,
>>>>-			   __hash_code __code, __node_type* __n);
>>>>+	_M_insert_multi_node(__node_type* __hint, __hash_code __code,
>>>>+			     const _NodeAccessor& __node_accessor);
>>>
>>>It looks like most times you call these functions you pass an
>>>identical lambda expression, but each of those lambda expressions will
>>>create a unique type. That means you create different instantiations
>>>of the function templates even though they do exactly the same thing.
>>>
>>>That's just generating multiple copies of identical code. Passing in a
>>>function object to provide the node pointer doesn't really seem
>>>necessary anyway, so if it results in larger executables it's really
>>>not desirable.
>>
>>Also I didn't really like the name NodeAccessor. It's not an accessor,
>>because it performs ownership transfer. Invoking __node_accessor()
>>returns a __node_type* by releasing it from the previous owner (by
>>setting the owner's pointer member to null).
>>
>>Passing a const reference to something called NodeAccessor does not
>>make it clear that it performs a mutating operation like that! If the
>>_M_insert_unique_node and _M_insert_multi_node functions did the
>>__node_accessor() call *before* rehashing, and rehashing threw an
>>exception, then they would leak. So it's important that the
>>__node_acessor() call happens at the right time, and so it's important
>>to name it well.
>>
>>In my suggested patch the naming isn't misleading, because we just
>>pass a raw __node_type* and have a new comment saying:
>>
>>    // Takes ownership of __n if insertion succeeds, throws otherwise.
>>
>>The function doesn't have a callable with non-local effects that
>>modifies an object outside the function. Because the caller sets the
>>previous owner's pointer to null there's no danger of it happening at
>>the wrong time; it can only happen after the function has returned and
>>ownership transfer has completed.
>
>As a further evolution that simplifies some uses of _Scoped_node we
>could give it a constructor that allocates a node and constructs an
>element, as in the attached patch.

Of course all this code is completely wrong, because it uses raw
pointers not the allocator's pointer type. But that's a much bigger
problem that needs to be solved separately.

François Dumont June 7, 2019, 4:39 p.m. UTC | #5

On 6/5/19 6:22 PM, Jonathan Wakely wrote:
> On 04/06/19 19:19 +0200, François Dumont wrote:
>> Hi
>>
>>     Here is a patch to enhance the _Hashtable extract node API and 
>> fix a FIXME request.
>>
>>     The enhancement to the extract node Api is that extract(const 
>> key_type&) do not call extract(const_iterator) anymore. Doing so we 
>> had to loop again through bucket nodes to find the previous node to 
>> the one to extract. Even if a bucket shall not contain many nodes (in 
>> unique key mode) it's easy to avoid it.
>
> Nice.
>
>>     To fix the FIXME I introduced a node smart pointer type managing 
>> the node lifetime. The node is extracted from this smart pointer only 
>> when there can't be any exception raised. In the context of the node 
>> extract api the node handle is considered as a smart pointer. So the 
>> node handle will remain owner of the node in case of exception when 
>> reinserting it, I hope it is the expected behavior.
>
> Yes, that's right.
>
> I was going to suggest just using the node handle type instead of
> inventing a new smart pointer, but the handle type uses std::optional
> so isn't available for C++11/14.

I considered it too, or even use shared_ptr but thought it was overkill.

>
>
>>     * include/bits/hashtable_policy.h
>>     (struct _NodeSmartPointer<_NodeAlloc>): New.
>>     (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
>>     (_Map_base<>::operator[](key_type&&)): Likewise.
>>     * include/bits/hashtable.h
>>     (_Hashtable<>::__node_sp_t): New.
>>     (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
>>     __node_type*, size_type)): Replace by...
>> (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
>>     size_type, __hash_code, const _NodeAccessor&, size_type)): ...that.
>>     (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
>>     __node_type*)): Replace by...
>> (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
>>     __hash_code, const _NodeAccessor&)): ...that.
>>     (_Hashtable<>::_M_reinsert_node): Adapt.
>>     (_Hashtable<>::_M_reinsert_node_multi): Adapt.
>>     (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
>>     (_Hashtable<>::extract(const_iterator)): Use latter.
>>     (_Hashtable<>::extract(const _Key&)): Likewise.
>>     (_Hashtable<>::_M_merge_unique): Adapt.
>>     (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
>>     (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
>>     _Args&&...)): Adapt.
>>
>> Tested under Linux x86_64.
>>
>> Ok to commit ?
>>
>> François
>>
>
>> diff --git a/libstdc++-v3/include/bits/hashtable.h 
>> b/libstdc++-v3/include/bits/hashtable.h
>> index e2e3f016a35..307865b96bf 100644
>> --- a/libstdc++-v3/include/bits/hashtable.h
>> +++ b/libstdc++-v3/include/bits/hashtable.h
>> @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>       using __hash_cached = typename __traits_type::__hash_cached;
>>       using __node_type = __detail::_Hash_node<_Value, 
>> __hash_cached::value>;
>>       using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>;
>> +      using __node_sp_t = 
>> __detail::_NodeSmartPointer<__node_alloc_type>;
>>
>>       using __hashtable_alloc = 
>> __detail::_Hashtable_alloc<__node_alloc_type>;
>>
>> @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>       __node_base*
>>       _M_get_previous_node(size_type __bkt, __node_base* __n);
>>
>> -      // Insert node with hash code __code, in bucket bkt if no 
>> rehash (assumes
>> -      // no element with its key already present). Take ownership of 
>> the node,
>> -      // deallocate it on exception.
>> +      // Insert node with key __k and hash code __code, in bucket 
>> __bkt if no
>> +      // rehash (assumes no element with its key already present).
>> +      template<typename _NodeAccessor>
>>     iterator
>> -      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>> -                __node_type* __n, size_type __n_elt = 1);
>> +    _M_insert_unique_node(const key_type& __k, size_type __bkt,
>> +                  __hash_code __code, const _NodeAccessor&,
>> +                  size_type __n_elt = 1);
>>
>> -      // Insert node with hash code __code. Take ownership of the node,
>> -      // deallocate it on exception.
>> +      // Insert node with hash code __code.
>> +      template<typename _NodeAccessor>
>>     iterator
>> -      _M_insert_multi_node(__node_type* __hint,
>> -               __hash_code __code, __node_type* __n);
>> +    _M_insert_multi_node(__node_type* __hint, __hash_code __code,
>> +                 const _NodeAccessor& __node_accessor);
>
> It looks like most times you call these functions you pass an
> identical lambda expression, but each of those lambda expressions will
> create a unique type. That means you create different instantiations
> of the function templates even though they do exactly the same thing.
>
> That's just generating multiple copies of identical code. Passing in a
> function object to provide the node pointer doesn't really seem
> necessary anyway, so if it results in larger executables it's really
> not desirable.


Maybe one the reasons we have a difference when running performance 
tests in -03. I don't know why I abused so much of lambdas when I see 
your proposal.


>
> The attached patch still just passes in a node pointer (which the
> function takes ownership of, unless it throws). Because the semantics
> of _M_insert_multi_node change, we need the symbol name to change as
> well (so old code linking to the old symbol doesn't find the new
> definition with different semantics). Passing in the key makes it a
> different symbol (in almost all cases the caller has the key available
> already anyway).
Very nice.
>
> An alternative design would be for _M_insert_(unique|multi)_node to
> take a reference to the RAII type and set its _M_node member to null
> after taking ownership of that pointer. That would be more explicit
> about the ownership transfer, however it would require changes to the
> _M_reinsert_node and _M_reinsert_node_multi functions which don't use
> the RAII type (because the __node_type* is already owned by a node
> handle).
>
>>
>>       template<typename... _Args>
>>     std::pair<iterator, bool>
>> @@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>           }
>>         else
>>           {
>> -        __ret.position
>> -          = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
>> -        __nh._M_ptr = nullptr;
>> +        __ret.position = _M_insert_unique_node(__k, __bkt, __code,
>> +                [&__nh]()
>> +                { return std::exchange(__nh._M_ptr, nullptr); });
>
> The existing code here can just be changed to pass in __k, but still
> set __nh._M_ptr to null after the call returns.
>
> i.e.
>
>         __ret.position
>           = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
>         __nh._M_ptr = nullptr;
>
>
>>         __ret.inserted = true;
>>           }
>>       }
>> @@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>       iterator
>>       _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh)
>>       {
>> -    iterator __ret;
>>     if (__nh.empty())
>> -      __ret = end();
>> -    else
>> -      {
>> +      return end();
>> +
>>     __glibcxx_assert(get_allocator() == __nh.get_allocator());
>>
>>     auto __code = this->_M_hash_code(__nh._M_key());
>> -        auto __node = std::exchange(__nh._M_ptr, nullptr);
>> -        // FIXME: this deallocates the node on exception.
>> -        __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>> -      }
>> -    return __ret;
>> +    return _M_insert_multi_node(__hint._M_cur, __code,
>> +                  [&__nh]()
>> +                  { return std::exchange(__nh._M_ptr, nullptr); });
>
> Now that the call won't deallocate anything, we can fix the FIXME by
> just delaying resetting __nh._M_ptr until the call has returned:
>
>        auto __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>        __nh._M_ptr = nullptr;
>        return __ret;
>
>
>>       }
>>
>> +    private:
>>       /// Extract a node.
>
> We don't want a Doxygen "///" comment on a private function, just "//"
> is OK. In this case I don't think we need any comment, the
> _M_extract_node name is clear enough. Saying "extract a node" adds
> nothing.
>
>
>>       node_type
>> -      extract(const_iterator __pos)
>> +      _M_extract_node(size_t __bkt, __node_base* __prev_n)
>>       {
>> -    __node_type* __n = __pos._M_cur;
>> -    size_t __bkt = _M_bucket_index(__n);
>> -
>> -    // Look for previous node to unlink it from the erased one, this
>> -    // is why we need buckets to contain the before begin to make
>> -    // this search fast.
>> -    __node_base* __prev_n = _M_get_previous_node(__bkt, __n);
>> -
>> +    __node_type* __n = static_cast<__node_type*>(__prev_n->_M_nxt);
>>     if (__prev_n == _M_buckets[__bkt])
>>       _M_remove_bucket_begin(__bkt, __n->_M_next(),
>>          __n->_M_nxt ? _M_bucket_index(__n->_M_next()) : 0);
>> @@ -861,14 +853,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>     return { __n, this->_M_node_allocator() };
>>       }
>>
>> +    public:
>> +      /// Extract a node.
>> +      node_type
>> +      extract(const_iterator __pos)
>> +      {
>> +    size_t __bkt = _M_bucket_index(__pos._M_cur);
>> +    return _M_extract_node(__bkt, _M_get_previous_node(__bkt, 
>> __pos._M_cur));
>> +      }
>> +
>>       /// Extract a node.
>>       node_type
>>       extract(const _Key& __k)
>>       {
>>     node_type __nh;
>> -    auto __pos = find(__k);
>> -    if (__pos != end())
>> -      __nh = extract(const_iterator(__pos));
>> +    __hash_code __code = this->_M_hash_code(__k);
>> +    std::size_t __bkt = _M_bucket_index(__k, __code);
>> +    if (__node_base* __prev_node = _M_find_before_node(__bkt, __k, 
>> __code))
>> +      __nh = _M_extract_node(__bkt, __prev_node);
>>     return __nh;
>>       }
>
> Looks good.
>
>> @@ -885,14 +887,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>       for (auto __i = __src.begin(), __end = __src.end(); __i != __end;)
>>         {
>>           auto __pos = __i++;
>> -          const key_type& __k = 
>> this->_M_extract()(__pos._M_cur->_M_v());
>> +          const key_type& __k = this->_M_extract()(*__pos);
>>           __hash_code __code = this->_M_hash_code(__k);
>>           size_type __bkt = _M_bucket_index(__k, __code);
>>           if (_M_find_node(__bkt, __k, __code) == nullptr)
>>         {
>>           auto __nh = __src.extract(__pos);
>> -          _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
>> -          __nh._M_ptr = nullptr;
>> +          _M_insert_unique_node(__k, __bkt, __code,
>> +                [&__nh]()
>> +                { return std::exchange(__nh._M_ptr, nullptr); },
>> +                __n_elt);
>
> Again, the old code seems fine. Just pass in __k.
>
>>           __n_elt = 1;
>>         }
>>           else if (__n_elt != 1)
>> @@ -1634,31 +1638,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>       -> pair<iterator, bool>
>>       {
>>     // First build the node to get access to the hash code
>> -    __node_type* __node
>> -      = this->_M_allocate_node(std::forward<_Args>(__args)...);
>> -    const key_type& __k = this->_M_extract()(__node->_M_v());
>> -    __hash_code __code;
>> -    __try
>> -      {
>> -        __code = this->_M_hash_code(__k);
>> -      }
>> -    __catch(...)
>> -      {
>> -        this->_M_deallocate_node(__node);
>
> Aside: It's confusing that _M_allocate_node actually allocates a node
> and constructs an element, and _M_deallocate_node destroys the element
> and deallocates the node (and we have _M_deallocate_node_ptr which
> just does the deallocation part). We should add comments explaining
> that.
>
>
>> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>> b/libstdc++-v3/include/bits/hashtable_policy.h
>> index 878154ae210..ddcff0ec9d0 100644
>> --- a/libstdc++-v3/include/bits/hashtable_policy.h
>> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
>> @@ -170,6 +170,40 @@ namespace __detail
>>       __hashtable_alloc& _M_h;
>>     };
>>
>> +  // Node smart pointer to benefit from RAII.
>> +  template<typename _NodeAlloc>
>> +    struct _NodeSmartPointer
>> +    {
>> +    private:
>> +      using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
>> +      using __node_type = typename __hashtable_alloc::__node_type;
>> +
>> +    public:
>> +      _NodeSmartPointer(__hashtable_alloc& __h, __node_type* __node) 
>> noexcept
>> +      : _M_h(__h), _M_node(__node) { }
>> +
>> +      ~_NodeSmartPointer()
>> +      {
>> +    if (_M_node)
>> +      _M_h._M_deallocate_node(_M_node);
>> +      }
>> +
>> +      __node_type*
>> +      _M_get() const { return _M_node; }
>> +
>> +      __node_type*
>> +      _M_release() noexcept
>> +      {
>> +    __node_type* __tmp = _M_node;
>> +    _M_node = nullptr;
>> +    return __tmp;
>> +      }
>> +
>> +    private:
>> +      __hashtable_alloc& _M_h;
>> +      __node_type* _M_node;
>> +    };
>
> This can be defined as a member of _Hashtable, and can be much
> simpler:
>
>      // Simple RAII type for managing a node containing an element
>      struct _Scoped_node
>      {
>     _Scoped_node(__hashtable_alloc* __h, __node_type* __n)
>     : _M_h(__h), _M_node(__n) { }
>
>     ~_Scoped_node() { if (_M_node) _M_h->_M_deallocate_node(_M_node); };
>
>     _Scoped_node(const _Scoped_node&) = delete;
>     _Scoped_node& operator=(const _Scoped_node&) = delete;
>
>     __hashtable_alloc* _M_h;
>     __node_type* _M_node;
>      };
>
> I've attached a patch implementing these suggested changes. I think
> it's simpler this way, what do you think?

I think it is much better than what I proposed.


>
> (This patch is generated with 'git diff -b' so whitespace changes
> aren't shown, so this shouldn't be applied directly. I can send
> another version with the whitespace changes that is suitable to
> commit).
>
Mine was generated with 'git diff -w' for the same reason.

You can send the full patch to me if you prefer but feel free to commit 
it yourself.

François

Jonathan Wakely June 17, 2019, 10:28 a.m. UTC | #6

On 07/06/19 18:39 +0200, François Dumont wrote:
>On 6/5/19 6:22 PM, Jonathan Wakely wrote:
>>On 04/06/19 19:19 +0200, François Dumont wrote:
>>>Hi
>>>
>>>    Here is a patch to enhance the _Hashtable extract node API and 
>>>fix a FIXME request.
>>>
>>>    The enhancement to the extract node Api is that extract(const 
>>>key_type&) do not call extract(const_iterator) anymore. Doing so 
>>>we had to loop again through bucket nodes to find the previous 
>>>node to the one to extract. Even if a bucket shall not contain 
>>>many nodes (in unique key mode) it's easy to avoid it.
>>
>>Nice.
>>
>>>    To fix the FIXME I introduced a node smart pointer type 
>>>managing the node lifetime. The node is extracted from this smart 
>>>pointer only when there can't be any exception raised. In the 
>>>context of the node extract api the node handle is considered as a 
>>>smart pointer. So the node handle will remain owner of the node in 
>>>case of exception when reinserting it, I hope it is the expected 
>>>behavior.
>>
>>Yes, that's right.
>>
>>I was going to suggest just using the node handle type instead of
>>inventing a new smart pointer, but the handle type uses std::optional
>>so isn't available for C++11/14.
>
>I considered it too, or even use shared_ptr but thought it was overkill.
>
>>
>>
>>>    * include/bits/hashtable_policy.h
>>>    (struct _NodeSmartPointer<_NodeAlloc>): New.
>>>    (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
>>>    (_Map_base<>::operator[](key_type&&)): Likewise.
>>>    * include/bits/hashtable.h
>>>    (_Hashtable<>::__node_sp_t): New.
>>>    (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
>>>    __node_type*, size_type)): Replace by...
>>>(_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
>>>    size_type, __hash_code, const _NodeAccessor&, size_type)): ...that.
>>>    (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
>>>    __node_type*)): Replace by...
>>>(_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
>>>    __hash_code, const _NodeAccessor&)): ...that.
>>>    (_Hashtable<>::_M_reinsert_node): Adapt.
>>>    (_Hashtable<>::_M_reinsert_node_multi): Adapt.
>>>    (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
>>>    (_Hashtable<>::extract(const_iterator)): Use latter.
>>>    (_Hashtable<>::extract(const _Key&)): Likewise.
>>>    (_Hashtable<>::_M_merge_unique): Adapt.
>>>    (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
>>>    (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
>>>    _Args&&...)): Adapt.
>>>
>>>Tested under Linux x86_64.
>>>
>>>Ok to commit ?
>>>
>>>François
>>>
>>
>>>diff --git a/libstdc++-v3/include/bits/hashtable.h 
>>>b/libstdc++-v3/include/bits/hashtable.h
>>>index e2e3f016a35..307865b96bf 100644
>>>--- a/libstdc++-v3/include/bits/hashtable.h
>>>+++ b/libstdc++-v3/include/bits/hashtable.h
>>>@@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      using __hash_cached = typename __traits_type::__hash_cached;
>>>      using __node_type = __detail::_Hash_node<_Value, 
>>>__hash_cached::value>;
>>>      using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>;
>>>+      using __node_sp_t = 
>>>__detail::_NodeSmartPointer<__node_alloc_type>;
>>>
>>>      using __hashtable_alloc = 
>>>__detail::_Hashtable_alloc<__node_alloc_type>;
>>>
>>>@@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      __node_base*
>>>      _M_get_previous_node(size_type __bkt, __node_base* __n);
>>>
>>>-      // Insert node with hash code __code, in bucket bkt if no 
>>>rehash (assumes
>>>-      // no element with its key already present). Take ownership 
>>>of the node,
>>>-      // deallocate it on exception.
>>>+      // Insert node with key __k and hash code __code, in bucket 
>>>__bkt if no
>>>+      // rehash (assumes no element with its key already present).
>>>+      template<typename _NodeAccessor>
>>>    iterator
>>>-      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>>>-                __node_type* __n, size_type __n_elt = 1);
>>>+    _M_insert_unique_node(const key_type& __k, size_type __bkt,
>>>+                  __hash_code __code, const _NodeAccessor&,
>>>+                  size_type __n_elt = 1);
>>>
>>>-      // Insert node with hash code __code. Take ownership of the node,
>>>-      // deallocate it on exception.
>>>+      // Insert node with hash code __code.
>>>+      template<typename _NodeAccessor>
>>>    iterator
>>>-      _M_insert_multi_node(__node_type* __hint,
>>>-               __hash_code __code, __node_type* __n);
>>>+    _M_insert_multi_node(__node_type* __hint, __hash_code __code,
>>>+                 const _NodeAccessor& __node_accessor);
>>
>>It looks like most times you call these functions you pass an
>>identical lambda expression, but each of those lambda expressions will
>>create a unique type. That means you create different instantiations
>>of the function templates even though they do exactly the same thing.
>>
>>That's just generating multiple copies of identical code. Passing in a
>>function object to provide the node pointer doesn't really seem
>>necessary anyway, so if it results in larger executables it's really
>>not desirable.
>
>
>Maybe one the reasons we have a difference when running performance 
>tests in -03. I don't know why I abused so much of lambdas when I see 
>your proposal.
>
>
>>
>>The attached patch still just passes in a node pointer (which the
>>function takes ownership of, unless it throws). Because the semantics
>>of _M_insert_multi_node change, we need the symbol name to change as
>>well (so old code linking to the old symbol doesn't find the new
>>definition with different semantics). Passing in the key makes it a
>>different symbol (in almost all cases the caller has the key available
>>already anyway).
>Very nice.
>>
>>An alternative design would be for _M_insert_(unique|multi)_node to
>>take a reference to the RAII type and set its _M_node member to null
>>after taking ownership of that pointer. That would be more explicit
>>about the ownership transfer, however it would require changes to the
>>_M_reinsert_node and _M_reinsert_node_multi functions which don't use
>>the RAII type (because the __node_type* is already owned by a node
>>handle).
>>
>>>
>>>      template<typename... _Args>
>>>    std::pair<iterator, bool>
>>>@@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>          }
>>>        else
>>>          {
>>>-        __ret.position
>>>-          = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
>>>-        __nh._M_ptr = nullptr;
>>>+        __ret.position = _M_insert_unique_node(__k, __bkt, __code,
>>>+                [&__nh]()
>>>+                { return std::exchange(__nh._M_ptr, nullptr); });
>>
>>The existing code here can just be changed to pass in __k, but still
>>set __nh._M_ptr to null after the call returns.
>>
>>i.e.
>>
>>        __ret.position
>>          = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
>>        __nh._M_ptr = nullptr;
>>
>>
>>>        __ret.inserted = true;
>>>          }
>>>      }
>>>@@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      iterator
>>>      _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh)
>>>      {
>>>-    iterator __ret;
>>>    if (__nh.empty())
>>>-      __ret = end();
>>>-    else
>>>-      {
>>>+      return end();
>>>+
>>>    __glibcxx_assert(get_allocator() == __nh.get_allocator());
>>>
>>>    auto __code = this->_M_hash_code(__nh._M_key());
>>>-        auto __node = std::exchange(__nh._M_ptr, nullptr);
>>>-        // FIXME: this deallocates the node on exception.
>>>-        __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>>>-      }
>>>-    return __ret;
>>>+    return _M_insert_multi_node(__hint._M_cur, __code,
>>>+                  [&__nh]()
>>>+                  { return std::exchange(__nh._M_ptr, nullptr); });
>>
>>Now that the call won't deallocate anything, we can fix the FIXME by
>>just delaying resetting __nh._M_ptr until the call has returned:
>>
>>       auto __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>>       __nh._M_ptr = nullptr;
>>       return __ret;
>>
>>
>>>      }
>>>
>>>+    private:
>>>      /// Extract a node.
>>
>>We don't want a Doxygen "///" comment on a private function, just "//"
>>is OK. In this case I don't think we need any comment, the
>>_M_extract_node name is clear enough. Saying "extract a node" adds
>>nothing.
>>
>>
>>>      node_type
>>>-      extract(const_iterator __pos)
>>>+      _M_extract_node(size_t __bkt, __node_base* __prev_n)
>>>      {
>>>-    __node_type* __n = __pos._M_cur;
>>>-    size_t __bkt = _M_bucket_index(__n);
>>>-
>>>-    // Look for previous node to unlink it from the erased one, this
>>>-    // is why we need buckets to contain the before begin to make
>>>-    // this search fast.
>>>-    __node_base* __prev_n = _M_get_previous_node(__bkt, __n);
>>>-
>>>+    __node_type* __n = static_cast<__node_type*>(__prev_n->_M_nxt);
>>>    if (__prev_n == _M_buckets[__bkt])
>>>      _M_remove_bucket_begin(__bkt, __n->_M_next(),
>>>         __n->_M_nxt ? _M_bucket_index(__n->_M_next()) : 0);
>>>@@ -861,14 +853,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>    return { __n, this->_M_node_allocator() };
>>>      }
>>>
>>>+    public:
>>>+      /// Extract a node.
>>>+      node_type
>>>+      extract(const_iterator __pos)
>>>+      {
>>>+    size_t __bkt = _M_bucket_index(__pos._M_cur);
>>>+    return _M_extract_node(__bkt, _M_get_previous_node(__bkt, 
>>>__pos._M_cur));
>>>+      }
>>>+
>>>      /// Extract a node.
>>>      node_type
>>>      extract(const _Key& __k)
>>>      {
>>>    node_type __nh;
>>>-    auto __pos = find(__k);
>>>-    if (__pos != end())
>>>-      __nh = extract(const_iterator(__pos));
>>>+    __hash_code __code = this->_M_hash_code(__k);
>>>+    std::size_t __bkt = _M_bucket_index(__k, __code);
>>>+    if (__node_base* __prev_node = _M_find_before_node(__bkt, 
>>>__k, __code))
>>>+      __nh = _M_extract_node(__bkt, __prev_node);
>>>    return __nh;
>>>      }
>>
>>Looks good.
>>
>>>@@ -885,14 +887,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      for (auto __i = __src.begin(), __end = __src.end(); __i != __end;)
>>>        {
>>>          auto __pos = __i++;
>>>-          const key_type& __k = 
>>>this->_M_extract()(__pos._M_cur->_M_v());
>>>+          const key_type& __k = this->_M_extract()(*__pos);
>>>          __hash_code __code = this->_M_hash_code(__k);
>>>          size_type __bkt = _M_bucket_index(__k, __code);
>>>          if (_M_find_node(__bkt, __k, __code) == nullptr)
>>>        {
>>>          auto __nh = __src.extract(__pos);
>>>-          _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
>>>-          __nh._M_ptr = nullptr;
>>>+          _M_insert_unique_node(__k, __bkt, __code,
>>>+                [&__nh]()
>>>+                { return std::exchange(__nh._M_ptr, nullptr); },
>>>+                __n_elt);
>>
>>Again, the old code seems fine. Just pass in __k.
>>
>>>          __n_elt = 1;
>>>        }
>>>          else if (__n_elt != 1)
>>>@@ -1634,31 +1638,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      -> pair<iterator, bool>
>>>      {
>>>    // First build the node to get access to the hash code
>>>-    __node_type* __node
>>>-      = this->_M_allocate_node(std::forward<_Args>(__args)...);
>>>-    const key_type& __k = this->_M_extract()(__node->_M_v());
>>>-    __hash_code __code;
>>>-    __try
>>>-      {
>>>-        __code = this->_M_hash_code(__k);
>>>-      }
>>>-    __catch(...)
>>>-      {
>>>-        this->_M_deallocate_node(__node);
>>
>>Aside: It's confusing that _M_allocate_node actually allocates a node
>>and constructs an element, and _M_deallocate_node destroys the element
>>and deallocates the node (and we have _M_deallocate_node_ptr which
>>just does the deallocation part). We should add comments explaining
>>that.
>>
>>
>>>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>>>b/libstdc++-v3/include/bits/hashtable_policy.h
>>>index 878154ae210..ddcff0ec9d0 100644
>>>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>>>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>>>@@ -170,6 +170,40 @@ namespace __detail
>>>      __hashtable_alloc& _M_h;
>>>    };
>>>
>>>+  // Node smart pointer to benefit from RAII.
>>>+  template<typename _NodeAlloc>
>>>+    struct _NodeSmartPointer
>>>+    {
>>>+    private:
>>>+      using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
>>>+      using __node_type = typename __hashtable_alloc::__node_type;
>>>+
>>>+    public:
>>>+      _NodeSmartPointer(__hashtable_alloc& __h, __node_type* 
>>>__node) noexcept
>>>+      : _M_h(__h), _M_node(__node) { }
>>>+
>>>+      ~_NodeSmartPointer()
>>>+      {
>>>+    if (_M_node)
>>>+      _M_h._M_deallocate_node(_M_node);
>>>+      }
>>>+
>>>+      __node_type*
>>>+      _M_get() const { return _M_node; }
>>>+
>>>+      __node_type*
>>>+      _M_release() noexcept
>>>+      {
>>>+    __node_type* __tmp = _M_node;
>>>+    _M_node = nullptr;
>>>+    return __tmp;
>>>+      }
>>>+
>>>+    private:
>>>+      __hashtable_alloc& _M_h;
>>>+      __node_type* _M_node;
>>>+    };
>>
>>This can be defined as a member of _Hashtable, and can be much
>>simpler:
>>
>>     // Simple RAII type for managing a node containing an element
>>     struct _Scoped_node
>>     {
>>    _Scoped_node(__hashtable_alloc* __h, __node_type* __n)
>>    : _M_h(__h), _M_node(__n) { }
>>
>>    ~_Scoped_node() { if (_M_node) _M_h->_M_deallocate_node(_M_node); };
>>
>>    _Scoped_node(const _Scoped_node&) = delete;
>>    _Scoped_node& operator=(const _Scoped_node&) = delete;
>>
>>    __hashtable_alloc* _M_h;
>>    __node_type* _M_node;
>>     };
>>
>>I've attached a patch implementing these suggested changes. I think
>>it's simpler this way, what do you think?
>
>I think it is much better than what I proposed.
>
>
>>
>>(This patch is generated with 'git diff -b' so whitespace changes
>>aren't shown, so this shouldn't be applied directly. I can send
>>another version with the whitespace changes that is suitable to
>>commit).
>>
>Mine was generated with 'git diff -w' for the same reason.
>
>You can send the full patch to me if you prefer but feel free to 
>commit it yourself.

Here's what I've committed. Compared to the previous patch this adds a
constructor to _Scoped_node that creates the new node, so the caller
doesn't have to create it:

       // Allocate a node and construct an element within it.
       template<typename... _Args>
         _Scoped_node(__hashtable_alloc* __h, _Args&&... __args)
         : _M_h(__h),
           _M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...))
         { }

Tested x86_64-linux, committed to trunk.

François Dumont June 18, 2019, 5:52 a.m. UTC | #7

A small regression noticed while merging.

We shouldn't keep on using a moved-from key_type instance.

Ok to commit ? Feel free to do it if you prefer, I'll do so at end of 
Europe day otherwise.

François


On 6/17/19 12:28 PM, Jonathan Wakely wrote:
> On 07/06/19 18:39 +0200, François Dumont wrote:
>> On 6/5/19 6:22 PM, Jonathan Wakely wrote:
>>> On 04/06/19 19:19 +0200, François Dumont wrote:
>>>> Hi
>>>>
>>>>     Here is a patch to enhance the _Hashtable extract node API and 
>>>> fix a FIXME request.
>>>>
>>>>     The enhancement to the extract node Api is that extract(const 
>>>> key_type&) do not call extract(const_iterator) anymore. Doing so we 
>>>> had to loop again through bucket nodes to find the previous node to 
>>>> the one to extract. Even if a bucket shall not contain many nodes 
>>>> (in unique key mode) it's easy to avoid it.
>>>
>>> Nice.
>>>
>>>>     To fix the FIXME I introduced a node smart pointer type 
>>>> managing the node lifetime. The node is extracted from this smart 
>>>> pointer only when there can't be any exception raised. In the 
>>>> context of the node extract api the node handle is considered as a 
>>>> smart pointer. So the node handle will remain owner of the node in 
>>>> case of exception when reinserting it, I hope it is the expected 
>>>> behavior.
>>>
>>> Yes, that's right.
>>>
>>> I was going to suggest just using the node handle type instead of
>>> inventing a new smart pointer, but the handle type uses std::optional
>>> so isn't available for C++11/14.
>>
>> I considered it too, or even use shared_ptr but thought it was overkill.
>>
>>>
>>>
>>>>     * include/bits/hashtable_policy.h
>>>>     (struct _NodeSmartPointer<_NodeAlloc>): New.
>>>>     (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
>>>>     (_Map_base<>::operator[](key_type&&)): Likewise.
>>>>     * include/bits/hashtable.h
>>>>     (_Hashtable<>::__node_sp_t): New.
>>>>     (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
>>>>     __node_type*, size_type)): Replace by...
>>>> (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
>>>>     size_type, __hash_code, const _NodeAccessor&, size_type)): 
>>>> ...that.
>>>>     (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
>>>>     __node_type*)): Replace by...
>>>> (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
>>>>     __hash_code, const _NodeAccessor&)): ...that.
>>>>     (_Hashtable<>::_M_reinsert_node): Adapt.
>>>>     (_Hashtable<>::_M_reinsert_node_multi): Adapt.
>>>>     (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
>>>>     (_Hashtable<>::extract(const_iterator)): Use latter.
>>>>     (_Hashtable<>::extract(const _Key&)): Likewise.
>>>>     (_Hashtable<>::_M_merge_unique): Adapt.
>>>>     (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
>>>> (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
>>>>     _Args&&...)): Adapt.
>>>>
>>>> Tested under Linux x86_64.
>>>>
>>>> Ok to commit ?
>>>>
>>>> François
>>>>
>>>
>>>> diff --git a/libstdc++-v3/include/bits/hashtable.h 
>>>> b/libstdc++-v3/include/bits/hashtable.h
>>>> index e2e3f016a35..307865b96bf 100644
>>>> --- a/libstdc++-v3/include/bits/hashtable.h
>>>> +++ b/libstdc++-v3/include/bits/hashtable.h
>>>> @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>       using __hash_cached = typename __traits_type::__hash_cached;
>>>>       using __node_type = __detail::_Hash_node<_Value, 
>>>> __hash_cached::value>;
>>>>       using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>;
>>>> +      using __node_sp_t = 
>>>> __detail::_NodeSmartPointer<__node_alloc_type>;
>>>>
>>>>       using __hashtable_alloc = 
>>>> __detail::_Hashtable_alloc<__node_alloc_type>;
>>>>
>>>> @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>       __node_base*
>>>>       _M_get_previous_node(size_type __bkt, __node_base* __n);
>>>>
>>>> -      // Insert node with hash code __code, in bucket bkt if no 
>>>> rehash (assumes
>>>> -      // no element with its key already present). Take ownership 
>>>> of the node,
>>>> -      // deallocate it on exception.
>>>> +      // Insert node with key __k and hash code __code, in bucket 
>>>> __bkt if no
>>>> +      // rehash (assumes no element with its key already present).
>>>> +      template<typename _NodeAccessor>
>>>>     iterator
>>>> -      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>>>> -                __node_type* __n, size_type __n_elt = 1);
>>>> +    _M_insert_unique_node(const key_type& __k, size_type __bkt,
>>>> +                  __hash_code __code, const _NodeAccessor&,
>>>> +                  size_type __n_elt = 1);
>>>>
>>>> -      // Insert node with hash code __code. Take ownership of the 
>>>> node,
>>>> -      // deallocate it on exception.
>>>> +      // Insert node with hash code __code.
>>>> +      template<typename _NodeAccessor>
>>>>     iterator
>>>> -      _M_insert_multi_node(__node_type* __hint,
>>>> -               __hash_code __code, __node_type* __n);
>>>> +    _M_insert_multi_node(__node_type* __hint, __hash_code __code,
>>>> +                 const _NodeAccessor& __node_accessor);
>>>
>>> It looks like most times you call these functions you pass an
>>> identical lambda expression, but each of those lambda expressions will
>>> create a unique type. That means you create different instantiations
>>> of the function templates even though they do exactly the same thing.
>>>
>>> That's just generating multiple copies of identical code. Passing in a
>>> function object to provide the node pointer doesn't really seem
>>> necessary anyway, so if it results in larger executables it's really
>>> not desirable.
>>
>>
>> Maybe one the reasons we have a difference when running performance 
>> tests in -03. I don't know why I abused so much of lambdas when I see 
>> your proposal.
>>
>>
>>>
>>> The attached patch still just passes in a node pointer (which the
>>> function takes ownership of, unless it throws). Because the semantics
>>> of _M_insert_multi_node change, we need the symbol name to change as
>>> well (so old code linking to the old symbol doesn't find the new
>>> definition with different semantics). Passing in the key makes it a
>>> different symbol (in almost all cases the caller has the key available
>>> already anyway).
>> Very nice.
>>>
>>> An alternative design would be for _M_insert_(unique|multi)_node to
>>> take a reference to the RAII type and set its _M_node member to null
>>> after taking ownership of that pointer. That would be more explicit
>>> about the ownership transfer, however it would require changes to the
>>> _M_reinsert_node and _M_reinsert_node_multi functions which don't use
>>> the RAII type (because the __node_type* is already owned by a node
>>> handle).
>>>
>>>>
>>>>       template<typename... _Args>
>>>>     std::pair<iterator, bool>
>>>> @@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>           }
>>>>         else
>>>>           {
>>>> -        __ret.position
>>>> -          = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
>>>> -        __nh._M_ptr = nullptr;
>>>> +        __ret.position = _M_insert_unique_node(__k, __bkt, __code,
>>>> +                [&__nh]()
>>>> +                { return std::exchange(__nh._M_ptr, nullptr); });
>>>
>>> The existing code here can just be changed to pass in __k, but still
>>> set __nh._M_ptr to null after the call returns.
>>>
>>> i.e.
>>>
>>>         __ret.position
>>>           = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
>>>         __nh._M_ptr = nullptr;
>>>
>>>
>>>>         __ret.inserted = true;
>>>>           }
>>>>       }
>>>> @@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>       iterator
>>>>       _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh)
>>>>       {
>>>> -    iterator __ret;
>>>>     if (__nh.empty())
>>>> -      __ret = end();
>>>> -    else
>>>> -      {
>>>> +      return end();
>>>> +
>>>>     __glibcxx_assert(get_allocator() == __nh.get_allocator());
>>>>
>>>>     auto __code = this->_M_hash_code(__nh._M_key());
>>>> -        auto __node = std::exchange(__nh._M_ptr, nullptr);
>>>> -        // FIXME: this deallocates the node on exception.
>>>> -        __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>>>> -      }
>>>> -    return __ret;
>>>> +    return _M_insert_multi_node(__hint._M_cur, __code,
>>>> +                  [&__nh]()
>>>> +                  { return std::exchange(__nh._M_ptr, nullptr); });
>>>
>>> Now that the call won't deallocate anything, we can fix the FIXME by
>>> just delaying resetting __nh._M_ptr until the call has returned:
>>>
>>>        auto __ret = _M_insert_multi_node(__hint._M_cur, __code, 
>>> __node);
>>>        __nh._M_ptr = nullptr;
>>>        return __ret;
>>>
>>>
>>>>       }
>>>>
>>>> +    private:
>>>>       /// Extract a node.
>>>
>>> We don't want a Doxygen "///" comment on a private function, just "//"
>>> is OK. In this case I don't think we need any comment, the
>>> _M_extract_node name is clear enough. Saying "extract a node" adds
>>> nothing.
>>>
>>>
>>>>       node_type
>>>> -      extract(const_iterator __pos)
>>>> +      _M_extract_node(size_t __bkt, __node_base* __prev_n)
>>>>       {
>>>> -    __node_type* __n = __pos._M_cur;
>>>> -    size_t __bkt = _M_bucket_index(__n);
>>>> -
>>>> -    // Look for previous node to unlink it from the erased one, this
>>>> -    // is why we need buckets to contain the before begin to make
>>>> -    // this search fast.
>>>> -    __node_base* __prev_n = _M_get_previous_node(__bkt, __n);
>>>> -
>>>> +    __node_type* __n = static_cast<__node_type*>(__prev_n->_M_nxt);
>>>>     if (__prev_n == _M_buckets[__bkt])
>>>>       _M_remove_bucket_begin(__bkt, __n->_M_next(),
>>>>          __n->_M_nxt ? _M_bucket_index(__n->_M_next()) : 0);
>>>> @@ -861,14 +853,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>     return { __n, this->_M_node_allocator() };
>>>>       }
>>>>
>>>> +    public:
>>>> +      /// Extract a node.
>>>> +      node_type
>>>> +      extract(const_iterator __pos)
>>>> +      {
>>>> +    size_t __bkt = _M_bucket_index(__pos._M_cur);
>>>> +    return _M_extract_node(__bkt, _M_get_previous_node(__bkt, 
>>>> __pos._M_cur));
>>>> +      }
>>>> +
>>>>       /// Extract a node.
>>>>       node_type
>>>>       extract(const _Key& __k)
>>>>       {
>>>>     node_type __nh;
>>>> -    auto __pos = find(__k);
>>>> -    if (__pos != end())
>>>> -      __nh = extract(const_iterator(__pos));
>>>> +    __hash_code __code = this->_M_hash_code(__k);
>>>> +    std::size_t __bkt = _M_bucket_index(__k, __code);
>>>> +    if (__node_base* __prev_node = _M_find_before_node(__bkt, __k, 
>>>> __code))
>>>> +      __nh = _M_extract_node(__bkt, __prev_node);
>>>>     return __nh;
>>>>       }
>>>
>>> Looks good.
>>>
>>>> @@ -885,14 +887,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>       for (auto __i = __src.begin(), __end = __src.end(); __i != 
>>>> __end;)
>>>>         {
>>>>           auto __pos = __i++;
>>>> -          const key_type& __k = 
>>>> this->_M_extract()(__pos._M_cur->_M_v());
>>>> +          const key_type& __k = this->_M_extract()(*__pos);
>>>>           __hash_code __code = this->_M_hash_code(__k);
>>>>           size_type __bkt = _M_bucket_index(__k, __code);
>>>>           if (_M_find_node(__bkt, __k, __code) == nullptr)
>>>>         {
>>>>           auto __nh = __src.extract(__pos);
>>>> -          _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
>>>> -          __nh._M_ptr = nullptr;
>>>> +          _M_insert_unique_node(__k, __bkt, __code,
>>>> +                [&__nh]()
>>>> +                { return std::exchange(__nh._M_ptr, nullptr); },
>>>> +                __n_elt);
>>>
>>> Again, the old code seems fine. Just pass in __k.
>>>
>>>>           __n_elt = 1;
>>>>         }
>>>>           else if (__n_elt != 1)
>>>> @@ -1634,31 +1638,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>       -> pair<iterator, bool>
>>>>       {
>>>>     // First build the node to get access to the hash code
>>>> -    __node_type* __node
>>>> -      = this->_M_allocate_node(std::forward<_Args>(__args)...);
>>>> -    const key_type& __k = this->_M_extract()(__node->_M_v());
>>>> -    __hash_code __code;
>>>> -    __try
>>>> -      {
>>>> -        __code = this->_M_hash_code(__k);
>>>> -      }
>>>> -    __catch(...)
>>>> -      {
>>>> -        this->_M_deallocate_node(__node);
>>>
>>> Aside: It's confusing that _M_allocate_node actually allocates a node
>>> and constructs an element, and _M_deallocate_node destroys the element
>>> and deallocates the node (and we have _M_deallocate_node_ptr which
>>> just does the deallocation part). We should add comments explaining
>>> that.
>>>
>>>
>>>> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>>>> b/libstdc++-v3/include/bits/hashtable_policy.h
>>>> index 878154ae210..ddcff0ec9d0 100644
>>>> --- a/libstdc++-v3/include/bits/hashtable_policy.h
>>>> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
>>>> @@ -170,6 +170,40 @@ namespace __detail
>>>>       __hashtable_alloc& _M_h;
>>>>     };
>>>>
>>>> +  // Node smart pointer to benefit from RAII.
>>>> +  template<typename _NodeAlloc>
>>>> +    struct _NodeSmartPointer
>>>> +    {
>>>> +    private:
>>>> +      using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
>>>> +      using __node_type = typename __hashtable_alloc::__node_type;
>>>> +
>>>> +    public:
>>>> +      _NodeSmartPointer(__hashtable_alloc& __h, __node_type* 
>>>> __node) noexcept
>>>> +      : _M_h(__h), _M_node(__node) { }
>>>> +
>>>> +      ~_NodeSmartPointer()
>>>> +      {
>>>> +    if (_M_node)
>>>> +      _M_h._M_deallocate_node(_M_node);
>>>> +      }
>>>> +
>>>> +      __node_type*
>>>> +      _M_get() const { return _M_node; }
>>>> +
>>>> +      __node_type*
>>>> +      _M_release() noexcept
>>>> +      {
>>>> +    __node_type* __tmp = _M_node;
>>>> +    _M_node = nullptr;
>>>> +    return __tmp;
>>>> +      }
>>>> +
>>>> +    private:
>>>> +      __hashtable_alloc& _M_h;
>>>> +      __node_type* _M_node;
>>>> +    };
>>>
>>> This can be defined as a member of _Hashtable, and can be much
>>> simpler:
>>>
>>>      // Simple RAII type for managing a node containing an element
>>>      struct _Scoped_node
>>>      {
>>>     _Scoped_node(__hashtable_alloc* __h, __node_type* __n)
>>>     : _M_h(__h), _M_node(__n) { }
>>>
>>>     ~_Scoped_node() { if (_M_node) 
>>> _M_h->_M_deallocate_node(_M_node); };
>>>
>>>     _Scoped_node(const _Scoped_node&) = delete;
>>>     _Scoped_node& operator=(const _Scoped_node&) = delete;
>>>
>>>     __hashtable_alloc* _M_h;
>>>     __node_type* _M_node;
>>>      };
>>>
>>> I've attached a patch implementing these suggested changes. I think
>>> it's simpler this way, what do you think?
>>
>> I think it is much better than what I proposed.
>>
>>
>>>
>>> (This patch is generated with 'git diff -b' so whitespace changes
>>> aren't shown, so this shouldn't be applied directly. I can send
>>> another version with the whitespace changes that is suitable to
>>> commit).
>>>
>> Mine was generated with 'git diff -w' for the same reason.
>>
>> You can send the full patch to me if you prefer but feel free to 
>> commit it yourself.
>
> Here's what I've committed. Compared to the previous patch this adds a
> constructor to _Scoped_node that creates the new node, so the caller
> doesn't have to create it:
>
>       // Allocate a node and construct an element within it.
>       template<typename... _Args>
>         _Scoped_node(__hashtable_alloc* __h, _Args&&... __args)
>         : _M_h(__h),
> _M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...))
>         { }
>
> Tested x86_64-linux, committed to trunk.
>
>

Jonathan Wakely June 18, 2019, 10:54 a.m. UTC | #8

On 18/06/19 07:52 +0200, François Dumont wrote:
>A small regression noticed while merging.
>
>We shouldn't keep on using a moved-from key_type instance.
>
>Ok to commit ? Feel free to do it if you prefer, I'll do so at end of 
>Europe day otherwise.

>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
>index f5809c7443a..7e89e1b44c4 100644
>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>@@ -743,7 +743,8 @@ namespace __detail
> 	std::tuple<>()
>       };
>       auto __pos
>-	= __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node);
>+	= __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()),
>+				     __bkt, __code, __node._M_node);
>       __node._M_node = nullptr;
>       return __pos->second;
>     }

I can't create an example where this causes a problem, because the key
passed to _M_insert_unique_node is never used. So it doesn't matter
that it's been moved from.

So I have to wonder why we just added the key parameter to that
function, if it's never used.

As far as I can tell, it would only be used for a non-default range
hash function, and I don't care about that. Frankly I find the
policy-based _Hashtable completely unmaintainable and I'd gladly get
rid of all of it that isn't needed for the std::unordered_xxx
containers. The non-standard extensions are not used by anybody,
apparently not tested properly (or this regression should have been
noticed) and make the code too complicated.

We're adding new parameters that have to be passed around even though
they're never used by 99.999% of users. No wonder the code is only
fast at -O3.

</rant>

François Dumont June 18, 2019, 8:42 p.m. UTC | #9

On 6/18/19 12:54 PM, Jonathan Wakely wrote:
> On 18/06/19 07:52 +0200, François Dumont wrote:
>> A small regression noticed while merging.
>>
>> We shouldn't keep on using a moved-from key_type instance.
>>
>> Ok to commit ? Feel free to do it if you prefer, I'll do so at end of 
>> Europe day otherwise.
>
>
>> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>> b/libstdc++-v3/include/bits/hashtable_policy.h
>> index f5809c7443a..7e89e1b44c4 100644
>> --- a/libstdc++-v3/include/bits/hashtable_policy.h
>> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
>> @@ -743,7 +743,8 @@ namespace __detail
>>     std::tuple<>()
>>       };
>>       auto __pos
>> -    = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node);
>> +    = 
>> __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()),
>> +                     __bkt, __code, __node._M_node);
>>       __node._M_node = nullptr;
>>       return __pos->second;
>>     }
>
> I can't create an example where this causes a problem, because the key
> passed to _M_insert_unique_node is never used. So it doesn't matter
> that it's been moved from.
>
> So I have to wonder why we just added the key parameter to that
> function, if it's never used.

I think you've been influence by my patch. I was using a "_NodeAccessor" 
which wasn't giving access to the node without taking owership so I 
needed to pass the key properly to compute new bucket index in case of 
rehash.

But with your approach this change to the _M_insert_unique_node was 
simply unecessary so here is a patch to cleanup this part.

Ok to commit ?


>
> As far as I can tell, it would only be used for a non-default range
> hash function, and I don't care about that. Frankly I find the
> policy-based _Hashtable completely unmaintainable and I'd gladly get
> rid of all of it that isn't needed for the std::unordered_xxx
> containers. The non-standard extensions are not used by anybody,
> apparently not tested properly (or this regression should have been
> noticed) and make the code too complicated.
I never consider removing this but it would indeed make maintainance 
easy. I'll add to my TODO.
>
> We're adding new parameters that have to be passed around even though
> they're never used by 99.999% of users. No wonder the code is only
> fast at -O3.
>
> </rant>
>
>
>

Jonathan Wakely June 18, 2019, 10:47 p.m. UTC | #10

On 18/06/19 22:42 +0200, François Dumont wrote:
>On 6/18/19 12:54 PM, Jonathan Wakely wrote:
>>On 18/06/19 07:52 +0200, François Dumont wrote:
>>>A small regression noticed while merging.
>>>
>>>We shouldn't keep on using a moved-from key_type instance.
>>>
>>>Ok to commit ? Feel free to do it if you prefer, I'll do so at end 
>>>of Europe day otherwise.
>>
>>
>>>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>>>b/libstdc++-v3/include/bits/hashtable_policy.h
>>>index f5809c7443a..7e89e1b44c4 100644
>>>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>>>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>>>@@ -743,7 +743,8 @@ namespace __detail
>>>    std::tuple<>()
>>>      };
>>>      auto __pos
>>>-    = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node);
>>>+    = __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()),
>>>+                     __bkt, __code, __node._M_node);
>>>      __node._M_node = nullptr;
>>>      return __pos->second;
>>>    }
>>
>>I can't create an example where this causes a problem, because the key
>>passed to _M_insert_unique_node is never used. So it doesn't matter
>>that it's been moved from.
>>
>>So I have to wonder why we just added the key parameter to that
>>function, if it's never used.
>
>I think you've been influence by my patch. I was using a 
>"_NodeAccessor" which wasn't giving access to the node without taking 
>owership so I needed to pass the key properly to compute new bucket 
>index in case of rehash.
>
>But with your approach this change to the _M_insert_unique_node was 
>simply unecessary so here is a patch to cleanup this part.

Ha! I see, thanks. So I should have removed that key_type parameter
again after removing the NodeAccessor stuff.


>Ok to commit ?

No, because that would restore the original signature of the
_M_insert_unique_node function, but it has changed contract. Old
callers who expect that function to delete the node would now leak
memory if an exception is thrown.

If we change the contract of the function we need to change its
mangled name, so that callers expecting the old contract will not use
the new function.

I'll think about the best way to do that ...

François Dumont June 20, 2019, 8:40 p.m. UTC | #11

On 6/19/19 12:47 AM, Jonathan Wakely wrote:
> On 18/06/19 22:42 +0200, François Dumont wrote:
>> On 6/18/19 12:54 PM, Jonathan Wakely wrote:
>>> On 18/06/19 07:52 +0200, François Dumont wrote:
>>>> A small regression noticed while merging.
>>>>
>>>> We shouldn't keep on using a moved-from key_type instance.
>>>>
>>>> Ok to commit ? Feel free to do it if you prefer, I'll do so at end 
>>>> of Europe day otherwise.
>>>
>>>
>>>> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>>>> b/libstdc++-v3/include/bits/hashtable_policy.h
>>>> index f5809c7443a..7e89e1b44c4 100644
>>>> --- a/libstdc++-v3/include/bits/hashtable_policy.h
>>>> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
>>>> @@ -743,7 +743,8 @@ namespace __detail
>>>>     std::tuple<>()
>>>>       };
>>>>       auto __pos
>>>> -    = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node);
>>>> +    = 
>>>> __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()),
>>>> +                     __bkt, __code, __node._M_node);
>>>>       __node._M_node = nullptr;
>>>>       return __pos->second;
>>>>     }
>>>
>>> I can't create an example where this causes a problem, because the key
>>> passed to _M_insert_unique_node is never used. So it doesn't matter
>>> that it's been moved from.
>>>
>>> So I have to wonder why we just added the key parameter to that
>>> function, if it's never used.
>>
>> I think you've been influence by my patch. I was using a 
>> "_NodeAccessor" which wasn't giving access to the node without taking 
>> owership so I needed to pass the key properly to compute new bucket 
>> index in case of rehash.
>>
>> But with your approach this change to the _M_insert_unique_node was 
>> simply unecessary so here is a patch to cleanup this part.
>
> Ha! I see, thanks. So I should have removed that key_type parameter
> again after removing the NodeAccessor stuff.
>
>
>> Ok to commit ?
>
> No, because that would restore the original signature of the
> _M_insert_unique_node function, but it has changed contract. Old
> callers who expect that function to delete the node would now leak
> memory if an exception is thrown.
>
Oh, yes, abi, I tend to forget even if the recent PR 90920 remind me 
about that, sorry.
> If we change the contract of the function we need to change its
> mangled name, so that callers expecting the old contract will not use
> the new function.
>
> I'll think about the best way to do that ...
>
Something like what's attached ?

I still use _GLIBCXX_INLINE_VERSION to tag functions that are kept just 
for abi-compatibility.

Ok to commit after having run tests ?

François

Review Hashtable extract node API

Commit Message

Comments

Patch