Patchwork Unordered container insertion hints

login
register
mail settings
Submitter François Dumont
Date May 15, 2013, 7:49 p.m.
Message ID <5193E6AE.8020306@gmail.com>
Download mbox | patch
Permalink /patch/244153/
State New
Headers show

Comments

François Dumont - May 15, 2013, 7:49 p.m.
Hi

     Here is a patch to consider the hint that users can give to 
enhancement insertion performances. As you can see I only use it for 
unordered_multi* containers to potentially avoid research within the 
bucket nodes.

     Note that I have use a call to _M_equals to avoid a hash code 
computation when we end up inserting after the hint. It is an 
optimization because I consider that _M_equals will be always faster 
than a hash code computation. I think that I will submit an other patch 
later to generalize this when possible to limit the small performance we 
noticed when adopting the new data model (unless performance tests are 
showing me that it is worst).

     I try to document it. If you accept this patch tell me if it is 
with or without the documentation cause I know that my English is not 
good enough. I didn't find out how I can fix the doc URLs regarding 
usage of hints in the std::unordered_* Doxygen comments.

2013-05-20  François Dumont  <fdumont@gcc.gnu.org>

     * include/bits/hashtable_policy.h (_Insert_base): Consider hint in
     insert methods.
     * include/bits/hashtable.h: Likewise.
     * testsuite/23_containers/unordered_multimap/insert/hint.cc: New.
     * doc/xml/manual/containers.xml: Document hitting in unordered
     containers.

François
François Dumont - May 23, 2013, 8:01 p.m.
Some feedback regarding this patch ?

Thanks


On 05/15/2013 09:49 PM, François Dumont wrote:
> Hi
>
>     Here is a patch to consider the hint that users can give to 
> enhancement insertion performances. As you can see I only use it for 
> unordered_multi* containers to potentially avoid research within the 
> bucket nodes.
>
>     Note that I have use a call to _M_equals to avoid a hash code 
> computation when we end up inserting after the hint. It is an 
> optimization because I consider that _M_equals will be always faster 
> than a hash code computation. I think that I will submit an other 
> patch later to generalize this when possible to limit the small 
> performance we noticed when adopting the new data model (unless 
> performance tests are showing me that it is worst).
>
>     I try to document it. If you accept this patch tell me if it is 
> with or without the documentation cause I know that my English is not 
> good enough. I didn't find out how I can fix the doc URLs regarding 
> usage of hints in the std::unordered_* Doxygen comments.
>
> 2013-05-20  François Dumont  <fdumont@gcc.gnu.org>
>
>     * include/bits/hashtable_policy.h (_Insert_base): Consider hint in
>     insert methods.
>     * include/bits/hashtable.h: Likewise.
>     * testsuite/23_containers/unordered_multimap/insert/hint.cc: New.
>     * doc/xml/manual/containers.xml: Document hitting in unordered
>     containers.
>
> François
>
Paolo Carlini - May 23, 2013, 11 p.m.
On 05/23/2013 10:01 PM, François Dumont wrote:
> Some feedback regarding this patch ?
Two quick ones: what if the hint is wrong? I suppose the insertion 
succeeds anyway, it's only a little waste of time, right? Is it possible 
that for instance something throws in that case and would not now (when 
the hint is simply ignored)? In case, check and re-check we are still 
conforming.

In any case, I think it's quite easy to notice if an implementation is 
using the hint in this way or a similar one basing on some simple 
benchmarks, without looking of course at the actual implementation code. 
Do we have any idea what other implementations are doing? Like, eg, they 
invented something for unordered_set and map too? Or a better way to 
exploit the hint for the multi variants? Eventually I suppose we want to 
add a performance testcase to our testsuite.

Thanks!
Paolo.

Patch

Index: include/bits/hashtable.h
===================================================================
--- include/bits/hashtable.h	(revision 198899)
+++ include/bits/hashtable.h	(working copy)
@@ -225,7 +225,6 @@ 
       using __node_base = typename __hashtable_base::__node_base;
       using __bucket_type = typename __hashtable_base::__bucket_type;
       using __ireturn_type = typename __hashtable_base::__ireturn_type;
-      using __iconv_type = typename __hashtable_base::__iconv_type;
 
       using __map_base = __detail::_Map_base<_Key, _Value, _Alloc, _ExtractKey,
 					     _Equal, _H1, _H2, _Hash,
@@ -680,7 +679,8 @@ 
       // Insert node with hash code __code. Take ownership of the node,
       // deallocate it on exception.
       iterator
-      _M_insert_multi_node(__hash_code __code, __node_type* __n);
+      _M_insert_multi_node(__node_type* __hint,
+			   __hash_code __code, __node_type* __n);
 
       template<typename... _Args>
 	std::pair<iterator, bool>
@@ -688,16 +688,39 @@ 
 
       template<typename... _Args>
 	iterator
-	_M_emplace(std::false_type, _Args&&... __args);
+	_M_emplace(std::false_type __uk, _Args&&... __args)
+	{ return _M_emplace(cend(), __uk, std::forward<_Args>(__args)...); }
 
+      // Emplace with hint, useless when keys are unique.
+      template<typename... _Args>
+	iterator
+	_M_emplace(const_iterator, std::true_type __uk, _Args&&... __args)
+	{ return _M_emplace(__uk, std::forward<_Args>(__args)...).first; }
+
+      template<typename... _Args>
+	iterator
+	_M_emplace(const_iterator, std::false_type, _Args&&... __args);
+
       template<typename _Arg>
 	std::pair<iterator, bool>
 	_M_insert(_Arg&&, std::true_type);
 
       template<typename _Arg>
 	iterator
-	_M_insert(_Arg&&, std::false_type);
+	_M_insert(_Arg&& __arg, std::false_type __uk)
+	{ return _M_insert(cend(), std::forward<_Arg>(__arg), __uk); }
 
+      // Insert with hint, not used when keys are unique.
+      template<typename _Arg>
+	iterator
+	_M_insert(const_iterator, _Arg&& __arg, std::true_type __uk)
+	{ return _M_insert(std::forward<_Arg>(__arg), __uk).first; }
+
+      // Insert with hint when keys are not unique.
+      template<typename _Arg>
+	iterator
+	_M_insert(const_iterator, _Arg&&, std::false_type);
+
       size_type
       _M_erase(std::true_type, const key_type&);
 
@@ -716,8 +739,11 @@ 
 
       template<typename... _Args>
 	iterator
-	emplace_hint(const_iterator, _Args&&... __args)
-	{ return __iconv_type()(emplace(std::forward<_Args>(__args)...)); }
+	emplace_hint(const_iterator __hint, _Args&&... __args)
+	{
+	  return _M_emplace(__hint, __unique_keys(),
+			    std::forward<_Args>(__args)...);
+	}
 
       // Insert member functions via inheritance.
 
@@ -1636,7 +1662,7 @@ 
 			  _Traits>::iterator
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-      _M_emplace(std::false_type, _Args&&... __args)
+      _M_emplace(const_iterator __hint, std::false_type, _Args&&... __args)
       {
 	// First build the node to get its hash code.
 	__node_type* __node = _M_allocate_node(std::forward<_Args>(__args)...);
@@ -1652,7 +1678,7 @@ 
 	    __throw_exception_again;
 	  }
 
-	return _M_insert_multi_node(__code, __node);
+	return _M_insert_multi_node(__hint._M_cur, __code, __node);
       }
 
   template<typename _Key, typename _Value,
@@ -1704,7 +1730,8 @@ 
 			_Traits>::iterator
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-    _M_insert_multi_node(__hash_code __code, __node_type* __node)
+    _M_insert_multi_node(__node_type* __hint, __hash_code __code,
+			 __node_type* __node)
     {
       const __rehash_state& __saved_state = _M_rehash_policy._M_state();
       std::pair<bool, std::size_t> __do_rehash
@@ -1719,13 +1746,27 @@ 
 	  const key_type& __k = this->_M_extract()(__node->_M_v());
 	  size_type __bkt = _M_bucket_index(__k, __code);
 
-	  // Find the node before an equivalent one.
-	  __node_base* __prev = _M_find_before_node(__bkt, __k, __code);
+	  // Find the node before an equivalent one or use hint if it exists and
+	  // if it is equivalent.
+	  __node_base* __prev
+	    = __hint != nullptr && this->_M_equals(__k, __code, __hint)
+	    ? __hint
+	    : _M_find_before_node(__bkt, __k, __code);
 	  if (__prev)
 	    {
 	      // Insert after the node before the equivalent one.
 	      __node->_M_nxt = __prev->_M_nxt;
 	      __prev->_M_nxt = __node;
+	      if (__prev == __hint)
+		// hint might be the last bucket node, in this case we need to
+		// update next bucket.
+		if (__node->_M_nxt
+		    && !this->_M_equals(__k, __code, __node->_M_next()))
+		  {
+		    size_type __next_bkt = _M_bucket_index(__node->_M_next());
+		    if (__next_bkt != __bkt)
+		      _M_buckets[__next_bkt] = __node;
+		  }
 	    }
 	  else
 	    // The inserted node has no equivalent in the
@@ -1780,7 +1821,7 @@ 
 			  _Traits>::iterator
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-      _M_insert(_Arg&& __v, std::false_type)
+      _M_insert(const_iterator __hint, _Arg&& __v, std::false_type)
       {
 	// First compute the hash code so that we don't do anything if it
 	// throws.
@@ -1789,7 +1830,7 @@ 
 	// Second allocate new node so that we don't rehash if it throws.
 	__node_type* __node = _M_allocate_node(std::forward<_Arg>(__v));
 
-	return _M_insert_multi_node(__code, __node);
+	return _M_insert_multi_node(__hint._M_cur, __code, __node);
       }
 
   template<typename _Key, typename _Value,
Index: include/bits/hashtable_policy.h
===================================================================
--- include/bits/hashtable_policy.h	(revision 198899)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -612,7 +612,6 @@ 
 
       using __unique_keys = typename __hashtable_base::__unique_keys;
       using __ireturn_type = typename __hashtable_base::__ireturn_type;
-      using __iconv_type = typename __hashtable_base::__iconv_type;
 
       __hashtable&
       _M_conjure_hashtable()
@@ -626,8 +625,11 @@ 
       }
 
       iterator
-      insert(const_iterator, const value_type& __v)
-      { return __iconv_type()(insert(__v)); }
+      insert(const_iterator __hint, const value_type& __v)
+      {
+	__hashtable& __h = _M_conjure_hashtable();
+	return __h._M_insert(__hint, __v, __unique_keys());
+      }
 
       void
       insert(initializer_list<value_type> __l)
@@ -711,8 +713,11 @@ 
       }
 
       iterator
-      insert(const_iterator, value_type&& __v)
-      { return insert(std::move(__v)).first; }
+      insert(const_iterator __hint, value_type&& __v)
+      {
+	__hashtable& __h = this->_M_conjure_hashtable();
+	return __h._M_insert(__hint, std::move(__v), __unique_keys());
+      }
     };
 
   /// Specialization.
@@ -745,9 +750,12 @@ 
       }
 
       iterator
-      insert(const_iterator, value_type&& __v)
-      { return insert(std::move(__v)); }
-     };
+      insert(const_iterator __hint, value_type&& __v)
+      {
+	__hashtable& __h = this->_M_conjure_hashtable();
+	return __h._M_insert(__hint, std::move(__v), __unique_keys());
+      }
+    };
 
   /// Specialization.
   template<typename _Key, typename _Value, typename _Alloc,
@@ -769,7 +777,6 @@ 
       using __unique_keys = typename __base_type::__unique_keys;
       using __hashtable = typename __base_type::__hashtable;
       using __ireturn_type = typename __base_type::__ireturn_type;
-      using __iconv_type = typename __base_type::__iconv_type;
 
       using __base_type::insert;
 
@@ -792,8 +799,12 @@ 
 
       template<typename _Pair, typename = _IFconsp<_Pair>>
 	iterator
-	insert(const_iterator, _Pair&& __v)
-	{ return __iconv_type()(insert(std::forward<_Pair>(__v))); }
+	insert(const_iterator __hint, _Pair&& __v)
+	{
+	  __hashtable& __h = this->_M_conjure_hashtable();
+	  return __h._M_emplace(__hint, __unique_keys(),
+				std::forward<_Pair>(__v));
+	}
    };
 
   /**
@@ -1470,10 +1481,6 @@ 
     using __ireturn_type = typename std::conditional<__unique_keys::value,
 						     std::pair<iterator, bool>,
 						     iterator>::type;
-
-    using __iconv_type = typename  std::conditional<__unique_keys::value,
-						    _Select1st, _Identity
-						    >::type;
   private:
     using _EqualEBO = _Hashtable_ebo_helper<0, _Equal>;
     using _EqualHelper =  _Equal_helper<_Key, _Value, _ExtractKey, _Equal,
Index: testsuite/23_containers/unordered_multimap/insert/hint.cc
===================================================================
--- testsuite/23_containers/unordered_multimap/insert/hint.cc	(revision 0)
+++ testsuite/23_containers/unordered_multimap/insert/hint.cc	(revision 0)
@@ -0,0 +1,123 @@ 
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// Insert with hint, specific to this library implementation.
+
+#include <iterator>
+#include <unordered_map>
+#include <testsuite_hooks.h>
+
+void test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  typedef std::unordered_multimap<int, int> Map;
+  typedef typename Map::value_type Pair;
+
+  Map m;
+
+  auto it1 = m.insert(Pair(0, 0));
+  VERIFY( it1 != m.end() );
+  VERIFY( it1->first == 0 );
+  VERIFY( it1->second == 0 );
+
+  auto it2 = m.insert(it1, Pair(0, 2));
+  VERIFY( it2 != m.end() );
+  VERIFY( it2 != it1 );
+  VERIFY( it2->second == 2 );
+  VERIFY( it2 == std::next(it1) );
+
+  Pair p(0, 1);
+  it2 = m.insert(it1, p);
+  VERIFY( it2 == std::next(it1) );
+}
+
+struct hasher
+{
+  std::size_t operator()(int val) const
+  { return val / 2; }
+};
+
+void test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  typedef std::unordered_multimap<int, int, hasher> Map;
+  typedef typename Map::value_type Pair;
+
+  Map m;
+
+  auto it1 = m.insert(Pair(0, 0));
+  auto it2 = m.insert(it1, Pair(1, 0));
+  VERIFY( m.bucket(it1->first) == m.bucket(it2->first) );
+  VERIFY( m.bucket_size(m.bucket(it1->first)) == 2 );
+
+  auto it3 = m.insert(it2, Pair(2, 0));
+  VERIFY( m.bucket(it3->first) != m.bucket(it2->first) );
+  VERIFY( m.bucket_size(m.bucket(it3->first)) == 1 );
+
+  auto it4 = m.insert(it1, Pair(0, 1));
+  VERIFY( it4 == std::next(it1) );
+  VERIFY( m.bucket_size(m.bucket(it1->first)) == 3 );
+  VERIFY( m.bucket_size(m.bucket(it3->first)) == 1 );
+
+  Pair p(1, 1);
+  it4 = m.insert(it2, p);
+  VERIFY( it4 == std::next(it2) );
+  VERIFY( m.bucket_size(m.bucket(it1->first)) == 4 );
+  auto range = m.equal_range(0);
+  VERIFY( std::distance(range.first, range.second) == 2 );
+  range = m.equal_range(1);
+  VERIFY( std::distance(range.first, range.second) == 2 );
+}
+
+void test03()
+{
+  bool test __attribute__((unused)) = true;
+
+  typedef std::unordered_multimap<int, int> Map;
+  typedef typename Map::value_type Pair;
+
+  Map m;
+
+  auto it1 = m.insert(Pair(0, 0));
+  VERIFY( it1 != m.end() );
+  VERIFY( it1->first == 0 );
+  VERIFY( it1->second == 0 );
+
+  auto it2 = m.emplace_hint(it1, std::piecewise_construct,
+				 std::make_tuple(0),
+				 std::make_tuple(2));
+  VERIFY( it2 != m.end() );
+  VERIFY( it2 != it1 );
+  VERIFY( it2->second == 2 );
+  VERIFY( it2 == std::next(it1) );
+
+  Pair p(0, 1);
+  it2 = m.emplace_hint(it1, p);
+  VERIFY( it2 == std::next(it1) );
+}
+
+int main()
+{
+  test01();
+  test02();
+  test03();
+  return 0;
+}
Index: doc/xml/manual/containers.xml
===================================================================
--- doc/xml/manual/containers.xml	(revision 198899)
+++ doc/xml/manual/containers.xml	(working copy)
@@ -354,6 +354,55 @@ 
   <info><title>Unordered Associative</title></info>
   <?dbhtml filename="unordered_associative.html"?>
 
+  <section xml:id="containers.unordered.insert_hints" xreflabel="Insertion Hints">
+    <info><title>Insertion Hints</title></info>
+
+    <para>
+     Here is how the hinting works in the libstdc++ implementation of unordered
+     containers, and the rational behind this behavior.
+    </para>
+    <para>
+      In the following text, the phrase <emphasis>equivalent to</emphasis> refer
+      to the result of the invocation of the equal predicate imposed on the
+      container by its <code>key_equal</code> object, which defaults to (basically)
+      <quote>==</quote>.
+    </para>
+    <para>
+      Unordered containers can be seen as a <code>std::vector</code> of
+      <code>std::forward_list</code>. The <code>std::vector</code> represents
+      the buckets and each <code>std::forward_list</code> is the list of nodes
+      belonging to the same bucket. When inserting an element in such a data
+      structure we first need to compute the element hash code to find the
+      bucket to insert the element to, the second step depends on the uniqueness
+      of elements in the container.
+    </para>
+    <para>
+      In the case of <code>std::unordered_set</code> and
+      <code>std::unordered_map</code> you need to look through all bucket's
+      elements for an equivalent one. If there is none the insertion can be
+      achieved, otherwise the insertion fails. As we always need to loop though
+      all bucket's elements, the hint doesn't tell us if the element is already
+      present, and we don't have any constraint on where the new element is to
+      be inserted, the hint won't be of any help and will then be ignored.
+    </para>
+    <para>
+      In the case of <code>std::unordered_multiset</code>
+      and <code>std::unordered_multimap</code> equivalent elements must be
+      linked together so that the <code>equal_range(const key_type&amp;)</code>
+      can return the range of iterators pointing to all equivalent elements.
+      This is where hinting can be used to point to another equivalent element
+      already part of the container and so skip all non equivalent elements of
+      the bucket. So to be useful the hint shall point to an element equivalent
+      to the one being inserted. The new element will be then inserted right
+      after the hint. Note that because of an implementation detail inserting
+      after a node can require to update the bucket of the following node. To
+      check if the next bucket is to be modified we need to compute following
+      node hash code. So if you want your hint to be really efficient it should
+      be followed by another equivalent element, the implementation will detect
+      this equivalence and won't compute next element hash code.
+    </para>
+  </section>
+
   <section xml:id="containers.unordered.hash" xreflabel="Hash">
     <info><title>Hash Code</title></info>