diff mbox

[libstdc++] std::shuffle: Generate two swap positions at a time if possible

Message ID 20161014194150.GG2922@redhat.com
State New
Headers show

Commit Message

Jonathan Wakely Oct. 14, 2016, 7:41 p.m. UTC
On 02/09/16 20:53 +0200, Eelis wrote:
>On 2016-09-02 20:20, Eelis van der Weegen wrote:
>>On 2016-08-31 14:45, Jonathan Wakely wrote:
>>>Is this significantly faster than just using
>>>uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>>>need to duplicate the logic? (And people maintaining the code won't
>>>reconvince themselves it's correct every time they look at it :-)
>>>
>>>[..]
>>>
>>>Could we hoist this test out of the loop somehow?
>>>
>>>If we change the loop condition to be __i+1 < __last we don't need to
>>>test it on every iteration, and then after the loop we can just do
>>>the final swap if (__urange % 2).
>>
>>Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below.
>>
>>I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay!
>>
>>Updated patch attached.
>>
>
>Please ignore that patch, I used __g()&1 but that's invalid (the new "UniformRandomBitGenerator" name is misleading).
>
>Updated patch (which uses a proper distribution even for the [0,1] case) attached.


I've finally got round to committing this patch to trunk.

Thanks for your patience, and sorry for the delay!
diff mbox

Patch

commit 1ddf9566764a85da4826628098b352bd30ba2bbc
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Oct 14 20:27:13 2016 +0100

    Optimize std::shuffle by using generator to get two values at once
    
    2016-10-14  Eelis van der Weegen  <eelis@eelis.net>
    
    	* include/bits/stl_algo.h (shuffle): Extract two random numbers from
    	each generator invocation when its range is large enough.

diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 0538a79..db99cb8 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3772,6 +3772,47 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
       typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
       typedef typename __distr_type::param_type __p_type;
+
+      typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
+      typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
+
+      const __uc_type __urngrange = __g.max() - __g.min();
+      const __uc_type __urange = __uc_type(__last - __first);
+
+      if (__urngrange / __urange >= __urange)
+        // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
+      {
+	_RandomAccessIterator __i = __first + 1;
+
+	// Since we know the range isn't empty, an even number of elements
+	// means an uneven number of elements /to swap/, in which case we
+	// do the first one up front:
+
+	if ((__urange % 2) == 0)
+	{
+	  __distr_type __d{0, 1};
+	  std::iter_swap(__i++, __first + __d(__g));
+	}
+
+	// Now we know that __last - __i is even, so we do the rest in pairs,
+	// using a single distribution invocation to produce swap positions
+	// for two successive elements at a time:
+
+	while (__i != __last)
+	{
+	  const __uc_type __swap_range = __uc_type(__i - __first) + 1;
+	  const __uc_type __comp_range = __swap_range * (__swap_range + 1);
+
+	  std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1};
+	  const __uc_type __pospos = __d(__g);
+
+	  std::iter_swap(__i++, __first + (__pospos % __swap_range));
+	  std::iter_swap(__i++, __first + (__pospos / __swap_range));
+	}
+
+	return;
+      }
+
       __distr_type __d;
 
       for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)