Patchwork Unordered container insertion hints

login
register
mail settings
Submitter François Dumont
Date June 6, 2013, 8:33 p.m.
Message ID <51B0F21A.9030707@gmail.com>
Download mbox | patch
Permalink /patch/249543/
State New
Headers show

Comments

François Dumont - June 6, 2013, 8:33 p.m.
On 05/24/2013 01:00 AM, Paolo Carlini wrote:
> On 05/23/2013 10:01 PM, François Dumont wrote:
>> Some feedback regarding this patch ?
> Two quick ones: what if the hint is wrong? I suppose the insertion 
> succeeds anyway, it's only a little waste of time, right?

Right.

> Is it possible that for instance something throws in that case and 
> would not now (when the hint is simply ignored)? In case, check and 
> re-check we are still conforming.
I consider the hint only if it is equivalent to the inserted element so 
I invoke the equal_to functor for that. The invocation of the equal_to 
functor is already done if no hint is granted at the same location. So 
usage of the hint has no impact on exception safety.
>
> In any case, I think it's quite easy to notice if an implementation is 
> using the hint in this way or a similar one basing on some simple 
> benchmarks, without looking of course at the actual implementation 
> code. Do we have any idea what other implementations are doing? Like, 
> eg, they invented something for unordered_set and map too? Or a better 
> way to exploit the hint for the multi variants?

     I only bench llvm/clang implementation and notice no different with 
or without hint, I guess it is simply ignored. I haven't plan to check 
or bench other implementations. The usage of hint I am introducing is 
quite natural considering the new unordered containers data model. And 
if anyone has a better idea to deal with it then he is welcome to 
contribute !

> Eventually I suppose we want to add a performance testcase to our 
> testsuite.
Good request and the reason why it took me so long to answer. Writing 
such benchmark have shown me that users should be very careful with it 
cause it can do more bad than good.

unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions w/o 
hint     120r  120u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions with 
any hint     130r  130u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions with 
good hint      54r   54u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions with 
perfect hint      36r   36u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions w/o 
hint      40r   40u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions with 
any hint      38r   38u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions with 
bad hint      49r   50u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions with 
perfect hint      34r   35u    0s 64000016mem    0pf

     The small number represents how many time the same element is 
inserted and the big one the number of different elements. 100 X 20000 
means that we loop 100 times inserting the 20000 elements during each 
loop. 20000 X 100 means that the main loop is on the elements and we 
insert each 100 times. Being able to insert all the equivalent elements 
at the same time or not has a major impact on the performances to get 
the same result. This is because when a new element is inserted it will 
be first in its bucket and the following 99 insertions will benefit from 
it even without any hint.

     The bench also show that a bad hint can be worst than no hint. A 
bad hint is one that once used require to check that next bucket is not 
impacted by the insertion. To do so it requires a hash code computation 
(if it is not cached like in my use case) and check. I have added a word 
about being able to check performance before using hints. Here is the 
result using the default std::hash<std::string>, hash code is being cached.

unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions w/o 
hint      76r   76u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions with 
any hint      83r   83u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions with 
good hint      29r   29u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions with 
perfect hint      24r   23u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions w/o 
hint      27r   26u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions with 
any hint      24r   24u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions with 
bad hint      27r   27u    0s 64000016mem    0pf
unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions with 
perfect hint      23r   23u    0s 64000016mem    0pf

Almost no impact in this case when using a bad hint. I consider adding 
another condition to the use of the hint which is to have the element 
after the hint also equivalent to the inserted element. This way we are 
sure that next bucket won't be affected and do not need to compute a 
hash code. But the result was a rather counter-intuitive hint. To get a 
good one you had to do:

std::unordered_multiset<std::string> ums;
const std::string foo("foo");
ums.insert(foo);
auto hint = ums.insert(foo);
ums.insert(hint, foo);

     it means keeping the second insertion iterator as the best hint to 
insert foo string. Not very convenient to use in real life and quite 
error prone cause if you keep the first insertion hint then you get 
stuck with the worst hint ever !

Compared to the previous patch I have just added some __builtin_expect 
cause hint are not likely to be used very often.

Tested under Linux x86_64, ok to commit ? With or without doc ?

2013-06-06  François Dumont <fdumont@gcc.gnu.org>

     * include/bits/hashtable_policy.h (_Insert_base): Consider hint in
     insert methods.
     * include/bits/hashtable.h: Likewise.
     * testsuite/23_containers/unordered_multimap/insert/hint.cc: New.
     * 
testsuite/performance/23_containers/insert/unordered_multiset_hint.cc:
     New.
     * doc/xml/manual/containers.xml: Document hinting in unordered
     containers.

François
François Dumont - June 12, 2013, 8:12 p.m.
Hi

     Any news regarding this patch ?

Thanks

François


On 06/06/2013 10:33 PM, François Dumont wrote:
> On 05/24/2013 01:00 AM, Paolo Carlini wrote:
>> On 05/23/2013 10:01 PM, François Dumont wrote:
>>> Some feedback regarding this patch ?
>> Two quick ones: what if the hint is wrong? I suppose the insertion 
>> succeeds anyway, it's only a little waste of time, right?
>
> Right.
>
>> Is it possible that for instance something throws in that case and 
>> would not now (when the hint is simply ignored)? In case, check and 
>> re-check we are still conforming.
> I consider the hint only if it is equivalent to the inserted element 
> so I invoke the equal_to functor for that. The invocation of the 
> equal_to functor is already done if no hint is granted at the same 
> location. So usage of the hint has no impact on exception safety.
>>
>> In any case, I think it's quite easy to notice if an implementation 
>> is using the hint in this way or a similar one basing on some simple 
>> benchmarks, without looking of course at the actual implementation 
>> code. Do we have any idea what other implementations are doing? Like, 
>> eg, they invented something for unordered_set and map too? Or a 
>> better way to exploit the hint for the multi variants?
>
>     I only bench llvm/clang implementation and notice no different 
> with or without hint, I guess it is simply ignored. I haven't plan to 
> check or bench other implementations. The usage of hint I am 
> introducing is quite natural considering the new unordered containers 
> data model. And if anyone has a better idea to deal with it then he is 
> welcome to contribute !
>
>> Eventually I suppose we want to add a performance testcase to our 
>> testsuite.
> Good request and the reason why it took me so long to answer. Writing 
> such benchmark have shown me that users should be very careful with it 
> cause it can do more bad than good.
>
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions w/o 
> hint     120r  120u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
> with any hint     130r  130u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
> with good hint      54r   54u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
> with perfect hint      36r   36u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions w/o 
> hint      40r   40u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
> with any hint      38r   38u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
> with bad hint      49r   50u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
> with perfect hint      34r   35u    0s 64000016mem    0pf
>
>     The small number represents how many time the same element is 
> inserted and the big one the number of different elements. 100 X 20000 
> means that we loop 100 times inserting the 20000 elements during each 
> loop. 20000 X 100 means that the main loop is on the elements and we 
> insert each 100 times. Being able to insert all the equivalent 
> elements at the same time or not has a major impact on the 
> performances to get the same result. This is because when a new 
> element is inserted it will be first in its bucket and the following 
> 99 insertions will benefit from it even without any hint.
>
>     The bench also show that a bad hint can be worst than no hint. A 
> bad hint is one that once used require to check that next bucket is 
> not impacted by the insertion. To do so it requires a hash code 
> computation (if it is not cached like in my use case) and check. I 
> have added a word about being able to check performance before using 
> hints. Here is the result using the default std::hash<std::string>, 
> hash code is being cached.
>
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions w/o 
> hint      76r   76u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
> with any hint      83r   83u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
> with good hint      29r   29u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
> with perfect hint      24r   23u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions w/o 
> hint      27r   26u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
> with any hint      24r   24u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
> with bad hint      27r   27u    0s 64000016mem    0pf
> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
> with perfect hint      23r   23u    0s 64000016mem    0pf
>
> Almost no impact in this case when using a bad hint. I consider adding 
> another condition to the use of the hint which is to have the element 
> after the hint also equivalent to the inserted element. This way we 
> are sure that next bucket won't be affected and do not need to compute 
> a hash code. But the result was a rather counter-intuitive hint. To 
> get a good one you had to do:
>
> std::unordered_multiset<std::string> ums;
> const std::string foo("foo");
> ums.insert(foo);
> auto hint = ums.insert(foo);
> ums.insert(hint, foo);
>
>     it means keeping the second insertion iterator as the best hint to 
> insert foo string. Not very convenient to use in real life and quite 
> error prone cause if you keep the first insertion hint then you get 
> stuck with the worst hint ever !
>
> Compared to the previous patch I have just added some __builtin_expect 
> cause hint are not likely to be used very often.
>
> Tested under Linux x86_64, ok to commit ? With or without doc ?
>
> 2013-06-06  François Dumont <fdumont@gcc.gnu.org>
>
>     * include/bits/hashtable_policy.h (_Insert_base): Consider hint in
>     insert methods.
>     * include/bits/hashtable.h: Likewise.
>     * testsuite/23_containers/unordered_multimap/insert/hint.cc: New.
>     * 
> testsuite/performance/23_containers/insert/unordered_multiset_hint.cc:
>     New.
>     * doc/xml/manual/containers.xml: Document hinting in unordered
>     containers.
>
> François
>
François Dumont - June 19, 2013, 7:56 p.m.
Still no chance to have a look ?

     I think that that patch is a really safe one. Those that do not use 
hint won't be impacted. Those that are already using it without any 
reason might experiment a small performance issue if they found the way 
to always use the worst possible hint.

François


On 06/12/2013 10:12 PM, François Dumont wrote:
> Hi
>
>     Any news regarding this patch ?
>
> Thanks
>
> François
>
>
> On 06/06/2013 10:33 PM, François Dumont wrote:
>> On 05/24/2013 01:00 AM, Paolo Carlini wrote:
>>> On 05/23/2013 10:01 PM, François Dumont wrote:
>>>> Some feedback regarding this patch ?
>>> Two quick ones: what if the hint is wrong? I suppose the insertion 
>>> succeeds anyway, it's only a little waste of time, right?
>>
>> Right.
>>
>>> Is it possible that for instance something throws in that case and 
>>> would not now (when the hint is simply ignored)? In case, check and 
>>> re-check we are still conforming.
>> I consider the hint only if it is equivalent to the inserted element 
>> so I invoke the equal_to functor for that. The invocation of the 
>> equal_to functor is already done if no hint is granted at the same 
>> location. So usage of the hint has no impact on exception safety.
>>>
>>> In any case, I think it's quite easy to notice if an implementation 
>>> is using the hint in this way or a similar one basing on some simple 
>>> benchmarks, without looking of course at the actual implementation 
>>> code. Do we have any idea what other implementations are doing? 
>>> Like, eg, they invented something for unordered_set and map too? Or 
>>> a better way to exploit the hint for the multi variants?
>>
>>     I only bench llvm/clang implementation and notice no different 
>> with or without hint, I guess it is simply ignored. I haven't plan to 
>> check or bench other implementations. The usage of hint I am 
>> introducing is quite natural considering the new unordered containers 
>> data model. And if anyone has a better idea to deal with it then he 
>> is welcome to contribute !
>>
>>> Eventually I suppose we want to add a performance testcase to our 
>>> testsuite.
>> Good request and the reason why it took me so long to answer. Writing 
>> such benchmark have shown me that users should be very careful with 
>> it cause it can do more bad than good.
>>
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> w/o hint     120r  120u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> with any hint     130r  130u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> with good hint      54r   54u    0s 64000016mem 0pf
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> with perfect hint      36r   36u    0s 64000016mem 0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> w/o hint      40r   40u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> with any hint      38r   38u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> with bad hint      49r   50u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> with perfect hint      34r   35u    0s 64000016mem 0pf
>>
>>     The small number represents how many time the same element is 
>> inserted and the big one the number of different elements. 100 X 
>> 20000 means that we loop 100 times inserting the 20000 elements 
>> during each loop. 20000 X 100 means that the main loop is on the 
>> elements and we insert each 100 times. Being able to insert all the 
>> equivalent elements at the same time or not has a major impact on the 
>> performances to get the same result. This is because when a new 
>> element is inserted it will be first in its bucket and the following 
>> 99 insertions will benefit from it even without any hint.
>>
>>     The bench also show that a bad hint can be worst than no hint. A 
>> bad hint is one that once used require to check that next bucket is 
>> not impacted by the insertion. To do so it requires a hash code 
>> computation (if it is not cached like in my use case) and check. I 
>> have added a word about being able to check performance before using 
>> hints. Here is the result using the default std::hash<std::string>, 
>> hash code is being cached.
>>
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> w/o hint      76r   76u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> with any hint      83r   83u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> with good hint      29r   29u    0s 64000016mem 0pf
>> unordered_multiset_hint.cc    unordered_set 100 X 20000 insertions 
>> with perfect hint      24r   23u    0s 64000016mem 0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> w/o hint      27r   26u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> with any hint      24r   24u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> with bad hint      27r   27u    0s 64000016mem    0pf
>> unordered_multiset_hint.cc    unordered_set 20000 X 100 insertions 
>> with perfect hint      23r   23u    0s 64000016mem 0pf
>>
>> Almost no impact in this case when using a bad hint. I consider 
>> adding another condition to the use of the hint which is to have the 
>> element after the hint also equivalent to the inserted element. This 
>> way we are sure that next bucket won't be affected and do not need to 
>> compute a hash code. But the result was a rather counter-intuitive 
>> hint. To get a good one you had to do:
>>
>> std::unordered_multiset<std::string> ums;
>> const std::string foo("foo");
>> ums.insert(foo);
>> auto hint = ums.insert(foo);
>> ums.insert(hint, foo);
>>
>>     it means keeping the second insertion iterator as the best hint 
>> to insert foo string. Not very convenient to use in real life and 
>> quite error prone cause if you keep the first insertion hint then you 
>> get stuck with the worst hint ever !
>>
>> Compared to the previous patch I have just added some 
>> __builtin_expect cause hint are not likely to be used very often.
>>
>> Tested under Linux x86_64, ok to commit ? With or without doc ?
>>
>> 2013-06-06  François Dumont <fdumont@gcc.gnu.org>
>>
>>     * include/bits/hashtable_policy.h (_Insert_base): Consider hint in
>>     insert methods.
>>     * include/bits/hashtable.h: Likewise.
>>     * testsuite/23_containers/unordered_multimap/insert/hint.cc: New.
>>     * 
>> testsuite/performance/23_containers/insert/unordered_multiset_hint.cc:
>>     New.
>>     * doc/xml/manual/containers.xml: Document hinting in unordered
>>     containers.
>>
>> François
>>
>
Jonathan Wakely - June 23, 2013, 12:51 p.m.
On 19 June 2013 20:56, François Dumont wrote:
>     Still no chance to have a look ?

I'll try to finish reviewing it today, thanks for the reminder.
Jonathan Wakely - June 28, 2013, 11:46 a.m.
On 23 June 2013 13:51, Jonathan Wakely wrote:
> On 19 June 2013 20:56, François Dumont wrote:
>>     Still no chance to have a look ?
>
> I'll try to finish reviewing it today, thanks for the reminder.

Sorry for the delay.  The patch is OK, with a few small changes to the new docs:

Please change "rational" to "rationale"

"can require to update the bucket" should be "can require updating the bucket"

"need to compute following node hash code." should be "need to compute
the following node's hash code."

"won't compute next element hash code" should be "won't compute the
next element's hash code"

"bench" should be "benchmark"

I would say "more harm than good" instead of "more bad than good"

Thanks!

Patch

Index: include/bits/hashtable_policy.h
===================================================================
--- include/bits/hashtable_policy.h	(revision 199457)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -612,7 +612,6 @@ 
 
       using __unique_keys = typename __hashtable_base::__unique_keys;
       using __ireturn_type = typename __hashtable_base::__ireturn_type;
-      using __iconv_type = typename __hashtable_base::__iconv_type;
 
       __hashtable&
       _M_conjure_hashtable()
@@ -626,8 +625,11 @@ 
       }
 
       iterator
-      insert(const_iterator, const value_type& __v)
-      { return __iconv_type()(insert(__v)); }
+      insert(const_iterator __hint, const value_type& __v)
+      {
+	__hashtable& __h = _M_conjure_hashtable();
+	return __h._M_insert(__hint, __v, __unique_keys());
+      }
 
       void
       insert(initializer_list<value_type> __l)
@@ -711,8 +713,11 @@ 
       }
 
       iterator
-      insert(const_iterator, value_type&& __v)
-      { return insert(std::move(__v)).first; }
+      insert(const_iterator __hint, value_type&& __v)
+      {
+	__hashtable& __h = this->_M_conjure_hashtable();
+	return __h._M_insert(__hint, std::move(__v), __unique_keys());
+      }
     };
 
   /// Specialization.
@@ -745,9 +750,12 @@ 
       }
 
       iterator
-      insert(const_iterator, value_type&& __v)
-      { return insert(std::move(__v)); }
-     };
+      insert(const_iterator __hint, value_type&& __v)
+      {
+	__hashtable& __h = this->_M_conjure_hashtable();
+	return __h._M_insert(__hint, std::move(__v), __unique_keys());
+      }
+    };
 
   /// Specialization.
   template<typename _Key, typename _Value, typename _Alloc,
@@ -769,7 +777,6 @@ 
       using __unique_keys = typename __base_type::__unique_keys;
       using __hashtable = typename __base_type::__hashtable;
       using __ireturn_type = typename __base_type::__ireturn_type;
-      using __iconv_type = typename __base_type::__iconv_type;
 
       using __base_type::insert;
 
@@ -792,8 +799,12 @@ 
 
       template<typename _Pair, typename = _IFconsp<_Pair>>
 	iterator
-	insert(const_iterator, _Pair&& __v)
-	{ return __iconv_type()(insert(std::forward<_Pair>(__v))); }
+	insert(const_iterator __hint, _Pair&& __v)
+	{
+	  __hashtable& __h = this->_M_conjure_hashtable();
+	  return __h._M_emplace(__hint, __unique_keys(),
+				std::forward<_Pair>(__v));
+	}
    };
 
   /**
@@ -1470,10 +1481,6 @@ 
     using __ireturn_type = typename std::conditional<__unique_keys::value,
 						     std::pair<iterator, bool>,
 						     iterator>::type;
-
-    using __iconv_type = typename  std::conditional<__unique_keys::value,
-						    _Select1st, _Identity
-						    >::type;
   private:
     using _EqualEBO = _Hashtable_ebo_helper<0, _Equal>;
     using _EqualHelper =  _Equal_helper<_Key, _Value, _ExtractKey, _Equal,
Index: include/bits/hashtable.h
===================================================================
--- include/bits/hashtable.h	(revision 199457)
+++ include/bits/hashtable.h	(working copy)
@@ -225,7 +225,6 @@ 
       using __node_base = typename __hashtable_base::__node_base;
       using __bucket_type = typename __hashtable_base::__bucket_type;
       using __ireturn_type = typename __hashtable_base::__ireturn_type;
-      using __iconv_type = typename __hashtable_base::__iconv_type;
 
       using __map_base = __detail::_Map_base<_Key, _Value, _Alloc, _ExtractKey,
 					     _Equal, _H1, _H2, _Hash,
@@ -680,7 +679,8 @@ 
       // Insert node with hash code __code. Take ownership of the node,
       // deallocate it on exception.
       iterator
-      _M_insert_multi_node(__hash_code __code, __node_type* __n);
+      _M_insert_multi_node(__node_type* __hint,
+			   __hash_code __code, __node_type* __n);
 
       template<typename... _Args>
 	std::pair<iterator, bool>
@@ -688,16 +688,39 @@ 
 
       template<typename... _Args>
 	iterator
-	_M_emplace(std::false_type, _Args&&... __args);
+	_M_emplace(std::false_type __uk, _Args&&... __args)
+	{ return _M_emplace(cend(), __uk, std::forward<_Args>(__args)...); }
 
+      // Emplace with hint, useless when keys are unique.
+      template<typename... _Args>
+	iterator
+	_M_emplace(const_iterator, std::true_type __uk, _Args&&... __args)
+	{ return _M_emplace(__uk, std::forward<_Args>(__args)...).first; }
+
+      template<typename... _Args>
+	iterator
+	_M_emplace(const_iterator, std::false_type, _Args&&... __args);
+
       template<typename _Arg>
 	std::pair<iterator, bool>
 	_M_insert(_Arg&&, std::true_type);
 
       template<typename _Arg>
 	iterator
-	_M_insert(_Arg&&, std::false_type);
+	_M_insert(_Arg&& __arg, std::false_type __uk)
+	{ return _M_insert(cend(), std::forward<_Arg>(__arg), __uk); }
 
+      // Insert with hint, not used when keys are unique.
+      template<typename _Arg>
+	iterator
+	_M_insert(const_iterator, _Arg&& __arg, std::true_type __uk)
+	{ return _M_insert(std::forward<_Arg>(__arg), __uk).first; }
+
+      // Insert with hint when keys are not unique.
+      template<typename _Arg>
+	iterator
+	_M_insert(const_iterator, _Arg&&, std::false_type);
+
       size_type
       _M_erase(std::true_type, const key_type&);
 
@@ -716,8 +739,11 @@ 
 
       template<typename... _Args>
 	iterator
-	emplace_hint(const_iterator, _Args&&... __args)
-	{ return __iconv_type()(emplace(std::forward<_Args>(__args)...)); }
+	emplace_hint(const_iterator __hint, _Args&&... __args)
+	{
+	  return _M_emplace(__hint, __unique_keys(),
+			    std::forward<_Args>(__args)...);
+	}
 
       // Insert member functions via inheritance.
 
@@ -1636,7 +1662,7 @@ 
 			  _Traits>::iterator
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-      _M_emplace(std::false_type, _Args&&... __args)
+      _M_emplace(const_iterator __hint, std::false_type, _Args&&... __args)
       {
 	// First build the node to get its hash code.
 	__node_type* __node = _M_allocate_node(std::forward<_Args>(__args)...);
@@ -1652,7 +1678,7 @@ 
 	    __throw_exception_again;
 	  }
 
-	return _M_insert_multi_node(__code, __node);
+	return _M_insert_multi_node(__hint._M_cur, __code, __node);
       }
 
   template<typename _Key, typename _Value,
@@ -1704,7 +1730,8 @@ 
 			_Traits>::iterator
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-    _M_insert_multi_node(__hash_code __code, __node_type* __node)
+    _M_insert_multi_node(__node_type* __hint, __hash_code __code,
+			 __node_type* __node)
     {
       const __rehash_state& __saved_state = _M_rehash_policy._M_state();
       std::pair<bool, std::size_t> __do_rehash
@@ -1719,13 +1746,28 @@ 
 	  const key_type& __k = this->_M_extract()(__node->_M_v());
 	  size_type __bkt = _M_bucket_index(__k, __code);
 
-	  // Find the node before an equivalent one.
-	  __node_base* __prev = _M_find_before_node(__bkt, __k, __code);
+	  // Find the node before an equivalent one or use hint if it exists and
+	  // if it is equivalent.
+	  __node_base* __prev
+	    = __builtin_expect(__hint != nullptr, false)
+	      && this->_M_equals(__k, __code, __hint)
+		? __hint
+		: _M_find_before_node(__bkt, __k, __code);
 	  if (__prev)
 	    {
 	      // Insert after the node before the equivalent one.
 	      __node->_M_nxt = __prev->_M_nxt;
 	      __prev->_M_nxt = __node;
+	      if (__builtin_expect(__prev == __hint, false))
+	      	// hint might be the last bucket node, in this case we need to
+	      	// update next bucket.
+	      	if (__node->_M_nxt
+	      	    && !this->_M_equals(__k, __code, __node->_M_next()))
+	      	  {
+	      	    size_type __next_bkt = _M_bucket_index(__node->_M_next());
+	      	    if (__next_bkt != __bkt)
+	      	      _M_buckets[__next_bkt] = __node;
+	      	  }
 	    }
 	  else
 	    // The inserted node has no equivalent in the
@@ -1780,7 +1822,7 @@ 
 			  _Traits>::iterator
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-      _M_insert(_Arg&& __v, std::false_type)
+      _M_insert(const_iterator __hint, _Arg&& __v, std::false_type)
       {
 	// First compute the hash code so that we don't do anything if it
 	// throws.
@@ -1789,7 +1831,7 @@ 
 	// Second allocate new node so that we don't rehash if it throws.
 	__node_type* __node = _M_allocate_node(std::forward<_Arg>(__v));
 
-	return _M_insert_multi_node(__code, __node);
+	return _M_insert_multi_node(__hint._M_cur, __code, __node);
       }
 
   template<typename _Key, typename _Value,
Index: testsuite/23_containers/unordered_multimap/insert/hint.cc
===================================================================
--- testsuite/23_containers/unordered_multimap/insert/hint.cc	(revision 0)
+++ testsuite/23_containers/unordered_multimap/insert/hint.cc	(revision 0)
@@ -0,0 +1,123 @@ 
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// Insert with hint, specific to this library implementation.
+
+#include <iterator>
+#include <unordered_map>
+#include <testsuite_hooks.h>
+
+void test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  typedef std::unordered_multimap<int, int> Map;
+  typedef typename Map::value_type Pair;
+
+  Map m;
+
+  auto it1 = m.insert(Pair(0, 0));
+  VERIFY( it1 != m.end() );
+  VERIFY( it1->first == 0 );
+  VERIFY( it1->second == 0 );
+
+  auto it2 = m.insert(it1, Pair(0, 2));
+  VERIFY( it2 != m.end() );
+  VERIFY( it2 != it1 );
+  VERIFY( it2->second == 2 );
+  VERIFY( it2 == std::next(it1) );
+
+  Pair p(0, 1);
+  it2 = m.insert(it1, p);
+  VERIFY( it2 == std::next(it1) );
+}
+
+struct hasher
+{
+  std::size_t operator()(int val) const
+  { return val / 2; }
+};
+
+void test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  typedef std::unordered_multimap<int, int, hasher> Map;
+  typedef typename Map::value_type Pair;
+
+  Map m;
+
+  auto it1 = m.insert(Pair(0, 0));
+  auto it2 = m.insert(it1, Pair(1, 0));
+  VERIFY( m.bucket(it1->first) == m.bucket(it2->first) );
+  VERIFY( m.bucket_size(m.bucket(it1->first)) == 2 );
+
+  auto it3 = m.insert(it2, Pair(2, 0));
+  VERIFY( m.bucket(it3->first) != m.bucket(it2->first) );
+  VERIFY( m.bucket_size(m.bucket(it3->first)) == 1 );
+
+  auto it4 = m.insert(it1, Pair(0, 1));
+  VERIFY( it4 == std::next(it1) );
+  VERIFY( m.bucket_size(m.bucket(it1->first)) == 3 );
+  VERIFY( m.bucket_size(m.bucket(it3->first)) == 1 );
+
+  Pair p(1, 1);
+  it4 = m.insert(it2, p);
+  VERIFY( it4 == std::next(it2) );
+  VERIFY( m.bucket_size(m.bucket(it1->first)) == 4 );
+  auto range = m.equal_range(0);
+  VERIFY( std::distance(range.first, range.second) == 2 );
+  range = m.equal_range(1);
+  VERIFY( std::distance(range.first, range.second) == 2 );
+}
+
+void test03()
+{
+  bool test __attribute__((unused)) = true;
+
+  typedef std::unordered_multimap<int, int> Map;
+  typedef typename Map::value_type Pair;
+
+  Map m;
+
+  auto it1 = m.insert(Pair(0, 0));
+  VERIFY( it1 != m.end() );
+  VERIFY( it1->first == 0 );
+  VERIFY( it1->second == 0 );
+
+  auto it2 = m.emplace_hint(it1, std::piecewise_construct,
+				 std::make_tuple(0),
+				 std::make_tuple(2));
+  VERIFY( it2 != m.end() );
+  VERIFY( it2 != it1 );
+  VERIFY( it2->second == 2 );
+  VERIFY( it2 == std::next(it1) );
+
+  Pair p(0, 1);
+  it2 = m.emplace_hint(it1, p);
+  VERIFY( it2 == std::next(it1) );
+}
+
+int main()
+{
+  test01();
+  test02();
+  test03();
+  return 0;
+}
Index: testsuite/performance/23_containers/insert/unordered_multiset_hint.cc
===================================================================
--- testsuite/performance/23_containers/insert/unordered_multiset_hint.cc	(revision 0)
+++ testsuite/performance/23_containers/insert/unordered_multiset_hint.cc	(revision 0)
@@ -0,0 +1,336 @@ 
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// { dg-options "-std=gnu++11" }
+
+#include <testsuite_performance.h>
+
+#include <sstream>
+#include <string>
+#include <vector>
+#include <unordered_set>
+
+namespace
+{
+  const int sz = 2000000;
+  const std::string pattern = "test string #";
+  const int nb_copies = 100;
+
+  // Special std::string hasher based on std::hash<std::string> but not tag as
+  // slow so that hash code is not cached. It is easier to show impact of
+  // hinting in this context.
+  struct hasher
+  {
+    std::size_t
+    operator()(const std::string& str) const noexcept
+    {
+      //std::istringstream istr(str.substr(pattern.size()));
+      //std::size_t str_index;
+      //istr >> str_index;
+      //return str_index;
+      std::hash<std::string> std_hasher;
+      return std_hasher(str);
+    }
+  };
+
+  using ums_t = std::unordered_multiset<std::string, hasher>;
+
+  void
+  insert_with_perfect_hint1(const std::vector<std::string>& strs,
+			    ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(str));
+
+    for (int j = 1; j != nb_copies; ++j)
+      for (std::size_t i = 0; i != strs.size(); ++i)
+	ms.insert(hints[i], strs[i]);
+  }
+
+  void
+  insert_with_perfect_hint2(const std::vector<std::string>& strs,
+			    ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(str));
+
+    for (std::size_t i = 0; i != strs.size(); ++i)
+      for (int j = 1; j != nb_copies; ++j)
+	ms.insert(hints[i], strs[i]);
+  }
+
+  // Always insert with the result of the previous insertion. The result of
+  // the previous insertion will never be followed by an equivalent node
+  // resulting in a re-computation of its hash code which is expensive.
+  void
+  insert_with_good_hint(const std::vector<std::string>& strs,
+			ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(str));
+
+    for (int j = 1; j != nb_copies; ++j)
+      for (std::size_t i = 0; i != strs.size(); ++i)
+	hints[i] = ms.insert(hints[i], strs[i]);
+  }
+
+  // Note that the following use case is particularly bad especially compared to
+  // the solution without hint because without hint the first insertion will put
+  // it first in the bucket and following insertions will detect it and insert
+  // just before. By giving a hint insertion will be done just after forcing to
+  // check if it has no impact on the following bucket.
+  void
+  insert_with_bad_hint(const std::vector<std::string>& strs,
+		       ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(str));
+
+    for (std::size_t i = 0; i != strs.size(); ++i)
+      for (int j = 1; j != nb_copies; ++j)
+	hints[i] = ms.insert(hints[i], strs[i]);
+  }
+
+  void
+  insert_without_hint1(const std::vector<std::string>& strs,
+		       ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(str));
+
+    for (int j = 1; j != nb_copies; ++j)
+      for (std::size_t i = 0; i != strs.size(); ++i)
+	hints[i] = ms.insert(strs[i]);
+  }
+
+  // This version is the best one if you insert all equivalent elements at the
+  // same time. It demonstrates that most of the time it is better not to give
+  // any hint unless you have written a benchmark for your application showing
+  // that it does have a positive effect.
+  void
+  insert_without_hint2(const std::vector<std::string>& strs,
+		       ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(str));
+
+    for (std::size_t i = 0; i != strs.size(); ++i)
+      for (int j = 1; j != nb_copies; ++j)
+	hints[i] = ms.insert(strs[i]);
+  }
+
+  void
+  insert_with_any_hint1(const std::vector<std::string>& strs,
+			ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(ms.begin(), str));
+
+    std::size_t offset = strs.size() / 2;
+    for (int j = 1; j != nb_copies; ++j)
+      for (std::size_t i = 0; i != strs.size(); ++i)
+	{
+	  ms.insert(hints[(i + offset) % hints.size()], strs[i]);
+	  ++offset;
+	}
+  }
+
+  void
+  insert_with_any_hint2(const std::vector<std::string>& strs,
+			ums_t& ms)
+  {
+    std::vector<typename ums_t::iterator> hints;
+    hints.reserve(strs.size());
+    for (auto str : strs)
+      hints.push_back(ms.insert(ms.begin(), str));
+
+    std::size_t offset = strs.size() / 2;
+    for (std::size_t i = 0; i != strs.size(); ++i)
+      for (int j = 1; j != nb_copies; ++j)
+	{
+	  ms.insert(hints[(i + offset) % hints.size()], strs[i]);
+	  ++offset;
+	}
+  }
+}
+
+int main()
+{
+  using namespace __gnu_test;
+
+  const int nb_iter = 10;
+
+  std::vector<std::string> strs;
+  strs.reserve(sz / nb_copies);
+
+  for (int i = 0; i != sz / nb_copies; ++i)
+    {
+      std::ostringstream osstr;
+      osstr << pattern << i;
+      strs.push_back(osstr.str());
+    }
+
+  ums_t ms;
+  // Use a large load factor to make the context ideal for using hint because we
+  // will have many elements per bucket.
+  ms.max_load_factor(10.0f);
+  ms.reserve(sz);
+
+  // Warm up.
+  {
+    for (auto str : strs)
+      for (int j = 0; j != nb_copies; ++j)
+	ms.insert(str);
+  }
+
+  time_counter time_no_hint1, time_any_hint1, time_bad_hint, time_perfect_hint1;
+  time_counter time_no_hint2, time_any_hint2, time_good_hint, time_perfect_hint2;
+  resource_counter resource_no_hint1, resource_any_hint1, resource_bad_hint,
+    resource_perfect_hint1;
+  resource_counter resource_no_hint2, resource_any_hint2, resource_good_hint,
+    resource_perfect_hint2;
+
+  for (int i = 0; i != nb_iter; ++i)
+    {
+      // Bad hint
+      {
+	ms.clear();
+	start_counters(time_bad_hint, resource_bad_hint);
+	insert_with_bad_hint(strs, ms);
+	stop_counters(time_bad_hint, resource_bad_hint);
+      }
+
+      // No hint
+      {
+	ms.clear();
+	start_counters(time_no_hint1, resource_no_hint1);
+	insert_without_hint1(strs, ms);
+	stop_counters(time_no_hint1, resource_no_hint1);
+      }
+
+      // Any hint
+      {
+	ms.clear();
+	start_counters(time_any_hint1, resource_any_hint1);
+	insert_with_any_hint1(strs, ms);
+	stop_counters(time_any_hint1, resource_any_hint1);
+      }
+
+      // Good hint
+      {
+	ms.clear();
+	start_counters(time_good_hint, resource_good_hint);
+	insert_with_good_hint(strs, ms);
+	stop_counters(time_good_hint, resource_good_hint);
+      }
+
+      // No hint
+      {
+	ms.clear();
+	start_counters(time_no_hint2, resource_no_hint2);
+	insert_without_hint2(strs, ms);
+	stop_counters(time_no_hint2, resource_no_hint2);
+      }
+
+      // Perfect hint
+      {
+	ms.clear();
+	start_counters(time_perfect_hint2, resource_perfect_hint2);
+	insert_with_perfect_hint2(strs, ms);
+	stop_counters(time_perfect_hint2, resource_perfect_hint2);
+      }
+
+      // Any hint
+      {
+	ms.clear();
+	start_counters(time_any_hint2, resource_any_hint2);
+	insert_with_any_hint2(strs, ms);
+	stop_counters(time_any_hint2, resource_any_hint2);
+      }
+
+      // Perfect hint
+      {
+	ms.clear();
+	start_counters(time_perfect_hint1, resource_perfect_hint1);
+	insert_with_perfect_hint1(strs, ms);
+	stop_counters(time_perfect_hint1, resource_perfect_hint1);
+      }
+    }
+
+  std::ostringstream ostr;
+  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
+       << " insertions w/o hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_no_hint1, resource_no_hint1);
+
+  ostr.str("");
+  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
+       << " insertions with any hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_any_hint1, resource_any_hint1);
+
+  ostr.str("");
+  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
+       << " insertions with good hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_good_hint, resource_good_hint);
+
+  ostr.str("");
+  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
+       << " insertions with perfect hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_perfect_hint1, resource_perfect_hint1);
+
+  ostr.str("");
+  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
+       << " insertions w/o hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_no_hint2, resource_no_hint2);
+
+  ostr.str("");
+  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
+       << " insertions with any hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_any_hint2, resource_any_hint2);
+
+  ostr.str("");
+  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
+       << " insertions with bad hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_bad_hint, resource_bad_hint);
+
+  ostr.str("");
+  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
+       << " insertions with perfect hint";
+  report_performance(__FILE__, ostr.str().c_str(),
+		     time_perfect_hint2, resource_perfect_hint2);
+  return 0;
+}
Index: doc/xml/manual/containers.xml
===================================================================
--- doc/xml/manual/containers.xml	(revision 199457)
+++ doc/xml/manual/containers.xml	(working copy)
@@ -354,6 +354,60 @@ 
   <info><title>Unordered Associative</title></info>
   <?dbhtml filename="unordered_associative.html"?>
 
+  <section xml:id="containers.unordered.insert_hints" xreflabel="Insertion Hints">
+    <info><title>Insertion Hints</title></info>
+
+    <para>
+     Here is how the hinting works in the libstdc++ implementation of unordered
+     containers, and the rational behind this behavior.
+    </para>
+    <para>
+      In the following text, the phrase <emphasis>equivalent to</emphasis> refer
+      to the result of the invocation of the equal predicate imposed on the
+      container by its <code>key_equal</code> object, which defaults to (basically)
+      <quote>==</quote>.
+    </para>
+    <para>
+      Unordered containers can be seen as a <code>std::vector</code> of
+      <code>std::forward_list</code>. The <code>std::vector</code> represents
+      the buckets and each <code>std::forward_list</code> is the list of nodes
+      belonging to the same bucket. When inserting an element in such a data
+      structure we first need to compute the element hash code to find the
+      bucket to insert the element to, the second step depends on the uniqueness
+      of elements in the container.
+    </para>
+    <para>
+      In the case of <code>std::unordered_set</code> and
+      <code>std::unordered_map</code> you need to look through all bucket's
+      elements for an equivalent one. If there is none the insertion can be
+      achieved, otherwise the insertion fails. As we always need to loop though
+      all bucket's elements, the hint doesn't tell us if the element is already
+      present, and we don't have any constraint on where the new element is to
+      be inserted, the hint won't be of any help and will then be ignored.
+    </para>
+    <para>
+      In the case of <code>std::unordered_multiset</code>
+      and <code>std::unordered_multimap</code> equivalent elements must be
+      linked together so that the <code>equal_range(const key_type&amp;)</code>
+      can return the range of iterators pointing to all equivalent elements.
+      This is where hinting can be used to point to another equivalent element
+      already part of the container and so skip all non equivalent elements of
+      the bucket. So to be useful the hint shall point to an element equivalent
+      to the one being inserted. The new element will be then inserted right
+      after the hint. Note that because of an implementation detail inserting
+      after a node can require to update the bucket of the following node. To
+      check if the next bucket is to be modified we need to compute following
+      node hash code. So if you want your hint to be really efficient it should
+      be followed by another equivalent element, the implementation will detect
+      this equivalence and won't compute next element hash code.
+    </para>
+    <para>
+      It is highly advised to start using unordered containers hints only if you
+      have a bench that will demonstrate the benefit of it. If you don't then do
+      not use hints, it might do more bad than good.
+    </para>
+  </section>
+
   <section xml:id="containers.unordered.hash" xreflabel="Hash">
     <info><title>Hash Code</title></info>