diff mbox

Enhance std::hash for pointers

Message ID 554A751E.9030009@gmail.com
State New
Headers show

Commit Message

François Dumont May 6, 2015, 8:10 p.m. UTC
Hi

     Following Marc Glisse comment #4 
on:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641 I would like to 
propose this enhancement to the hash functor for pointers. It simply 
gets rid of the irrelevant bits on pointers hash code based on memory 
alignment of the pointed type. The only drawback I can think of is that 
the type needs to be complete at std::hash instantiation time but is it 
really an issue ?

     IMO it is quite obvious that the resulting hash code will be better 
but if anyone has a good method to prove it I can try to implement it. 
The test I have added in quality.cc is very basic and just reflect 
enhancement following Marc's comment.

2015-05-05  François Dumont <fdumont@gcc.gnu.org>

     * include/bits/functional_hash.h
     (std::__detail::_Lowest_power_of_two<size_t>): New.
     (std::hash<_Tp*>::operator()): Use latter.
     * testsuite/20_util/hash/quality.cc (pointer_quality_test): New.

Tested under Linux x86_64.

François

Comments

Richard Biener May 8, 2015, 8:02 a.m. UTC | #1
On Wed, May 6, 2015 at 10:10 PM, François Dumont <frs.dumont@gmail.com> wrote:
> Hi
>
>     Following Marc Glisse comment #4
> on:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641 I would like to
> propose this enhancement to the hash functor for pointers. It simply gets
> rid of the irrelevant bits on pointers hash code based on memory alignment
> of the pointed type. The only drawback I can think of is that the type needs
> to be complete at std::hash instantiation time but is it really an issue ?
>
>     IMO it is quite obvious that the resulting hash code will be better but

If you use a real hashing function that's not true.  That is, something
else than GCCs pointer_hash (void *p) { return (uintptr_t)p >>3; }.

Richard.

> if anyone has a good method to prove it I can try to implement it. The test
> I have added in quality.cc is very basic and just reflect enhancement
> following Marc's comment.
>
> 2015-05-05  François Dumont <fdumont@gcc.gnu.org>
>
>     * include/bits/functional_hash.h
>     (std::__detail::_Lowest_power_of_two<size_t>): New.
>     (std::hash<_Tp*>::operator()): Use latter.
>     * testsuite/20_util/hash/quality.cc (pointer_quality_test): New.
>
> Tested under Linux x86_64.
>
> François
>
François Dumont May 8, 2015, 8:18 p.m. UTC | #2
On 08/05/2015 10:02, Richard Biener wrote:
> On Wed, May 6, 2015 at 10:10 PM, François Dumont <frs.dumont@gmail.com> wrote:
>> Hi
>>
>>      Following Marc Glisse comment #4
>> on:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641 I would like to
>> propose this enhancement to the hash functor for pointers. It simply gets
>> rid of the irrelevant bits on pointers hash code based on memory alignment
>> of the pointed type. The only drawback I can think of is that the type needs
>> to be complete at std::hash instantiation time but is it really an issue ?
>>
>>      IMO it is quite obvious that the resulting hash code will be better but
> If you use a real hashing function that's not true.  That is, something
> else than GCCs pointer_hash (void *p) { return (uintptr_t)p >>3; }.

Sorry, I don't understand your remark. Do you mean that if someone is 
not using std::hash to hash its pointers he won't benefit from the 
enhancement ?

It is a good point however to see that gcc is using something similar 
internally.

François
Christopher Jefferson May 8, 2015, 10:10 p.m. UTC | #3
My concern with accepting this patch is that many of libstdc++'s hash
functions are awful from a mixing point of view -- you would get
exactly the same problem from users who have integers which are always
a multiple of a power of 2 (which is in practice not uncommon). This
would give exactly the same problem.

Rather than try to "fix" one hash function like this, we should just
accept our hash functions might have low quality lower order bits.



On 8 May 2015 at 21:18, François Dumont <frs.dumont@gmail.com> wrote:
> On 08/05/2015 10:02, Richard Biener wrote:
>>
>> On Wed, May 6, 2015 at 10:10 PM, François Dumont <frs.dumont@gmail.com>
>> wrote:
>>>
>>> Hi
>>>
>>>      Following Marc Glisse comment #4
>>> on:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641 I would like to
>>> propose this enhancement to the hash functor for pointers. It simply gets
>>> rid of the irrelevant bits on pointers hash code based on memory
>>> alignment
>>> of the pointed type. The only drawback I can think of is that the type
>>> needs
>>> to be complete at std::hash instantiation time but is it really an issue
>>> ?
>>>
>>>      IMO it is quite obvious that the resulting hash code will be better
>>> but
>>
>> If you use a real hashing function that's not true.  That is, something
>> else than GCCs pointer_hash (void *p) { return (uintptr_t)p >>3; }.
>
>
> Sorry, I don't understand your remark. Do you mean that if someone is not
> using std::hash to hash its pointers he won't benefit from the enhancement ?
>
> It is a good point however to see that gcc is using something similar
> internally.
>
> François
>
François Dumont May 11, 2015, 9:06 p.m. UTC | #4
My proposal should be consider out of any context. We don't know 
what std::hash is used for in user code, this is why I am proposing this 
patch even if for the moment it doesn't make any difference considering 
only our usage of it.

     Your remark would make more sens if we were talking about changing 
std::unordered_xxx containers number of buckets policy to a power of 2

François


On 09/05/2015 00:10, Christopher Jefferson wrote:
> My concern with accepting this patch is that many of libstdc++'s hash
> functions are awful from a mixing point of view -- you would get
> exactly the same problem from users who have integers which are always
> a multiple of a power of 2 (which is in practice not uncommon). This
> would give exactly the same problem.
>
> Rather than try to "fix" one hash function like this, we should just
> accept our hash functions might have low quality lower order bits.
>
>
>
> On 8 May 2015 at 21:18, François Dumont<frs.dumont@gmail.com>  wrote:
>> On 08/05/2015 10:02, Richard Biener wrote:
>>> On Wed, May 6, 2015 at 10:10 PM, François Dumont<frs.dumont@gmail.com>
>>> wrote:
>>>> Hi
>>>>
>>>>       Following Marc Glisse comment #4
>>>> on:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641  I would like to
>>>> propose this enhancement to the hash functor for pointers. It simply gets
>>>> rid of the irrelevant bits on pointers hash code based on memory
>>>> alignment
>>>> of the pointed type. The only drawback I can think of is that the type
>>>> needs
>>>> to be complete at std::hash instantiation time but is it really an issue
>>>> ?
>>>>
>>>>       IMO it is quite obvious that the resulting hash code will be better
>>>> but
>>> If you use a real hashing function that's not true.  That is, something
>>> else than GCCs pointer_hash (void *p) { return (uintptr_t)p >>3; }.
>> Sorry, I don't understand your remark. Do you mean that if someone is not
>> using std::hash to hash its pointers he won't benefit from the enhancement ?
>>
>> It is a good point however to see that gcc is using something similar
>> internally.
>>
>> François
>>
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/functional_hash.h b/libstdc++-v3/include/bits/functional_hash.h
index d94843f..a217f8a 100644
--- a/libstdc++-v3/include/bits/functional_hash.h
+++ b/libstdc++-v3/include/bits/functional_hash.h
@@ -36,6 +36,29 @@ 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
+namespace __detail
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  // Compute highest power of 2 lower or equal to __n.
+  template<size_t __n>
+    struct _Lowest_power_of_two
+    {
+      static const size_t __val
+        = _Lowest_power_of_two< (__n >> 1) >::__val + 1;
+    };
+
+  template<>
+    struct _Lowest_power_of_two<1>
+    { static const size_t __val = 0; };
+
+  template<>
+    struct _Lowest_power_of_two<0>
+    { static const size_t __val = 0; };
+
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /** @defgroup hashes Hashes
@@ -63,7 +86,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       size_t
       operator()(_Tp* __p) const noexcept
-      { return reinterpret_cast<size_t>(__p); }
+      {
+	return reinterpret_cast<size_t>(__p)
+	  >> __detail::_Lowest_power_of_two<alignof(_Tp)>::__val;
+      }
     };
 
   // Explicit specializations for integer types.
diff --git a/libstdc++-v3/testsuite/20_util/hash/quality.cc b/libstdc++-v3/testsuite/20_util/hash/quality.cc
index af417ed..d9c72c7 100644
--- a/libstdc++-v3/testsuite/20_util/hash/quality.cc
+++ b/libstdc++-v3/testsuite/20_util/hash/quality.cc
@@ -164,9 +164,20 @@  quality_test()
     }
 }
 
+void
+pointer_quality_test()
+{
+  bool test __attribute__((unused)) = true;
+
+  double d1, d2;
+  std::hash<double*> dh;
+  VERIFY( dh(&d1) % sizeof(double) != dh(&d2) % sizeof(double) );
+}
+
 int
 main()
 {
   quality_test();
+  pointer_quality_test();
   return 0;
 }