Patchwork [libstdc++-v3,parallel,mode] Correct part lengths calculation for parallel partial_sum

login
register
mail settings
Submitter Johannes Singler
Date June 8, 2010, 8:05 a.m.
Message ID <4C0DF9CD.9010807@kit.edu>
Download mbox | patch
Permalink /patch/54947/
State New
Headers show

Comments

Johannes Singler - June 8, 2010, 8:05 a.m.
Jonathan Wakely wrote:
> On 8 June 2010 08:21, Johannes Singler wrote:
>> This patch corrects the calculation of the part lengths for parallel
>> partial_sum, leading to the expected behavior for
>> partial_sum_dilation!=1, and thus better performance.
>>
>> Tested x86_64-unknown-linux-gnu: No regressions.
>>
>> Please approve for mainline.
>>
>> 2010-06-08  Johannes Singler  <singler@kit.edu>
>>
>>        * include/parallel/partial_sum.h
>>        (__parallel_partial_sum_linear):
>>        Correctly calculate part lengths for partial_sum_dilation!=1.
> 
> Please declare each variable on its own line, as per
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/source_code_style.html
> 
>       char* c = "abc";  // each variable goes on its own line, always.

Okay, I hope you like the attached version better.

Johannes
Jonathan Wakely - June 8, 2010, 8:54 a.m.
On 8 June 2010 09:05, Johannes Singler wrote:
> Jonathan Wakely wrote:
>> On 8 June 2010 08:21, Johannes Singler wrote:
>>> This patch corrects the calculation of the part lengths for parallel
>>> partial_sum, leading to the expected behavior for
>>> partial_sum_dilation!=1, and thus better performance.
>>>
>>> Tested x86_64-unknown-linux-gnu: No regressions.
>>>
>>> Please approve for mainline.
>>>
>>> 2010-06-08  Johannes Singler  <singler@kit.edu>
>>>
>>>        * include/parallel/partial_sum.h
>>>        (__parallel_partial_sum_linear):
>>>        Correctly calculate part lengths for partial_sum_dilation!=1.
>>
>> Please declare each variable on its own line, as per
>> http://gcc.gnu.org/onlinedocs/libstdc++/manual/source_code_style.html
>>
>>       char* c = "abc";  // each variable goes on its own line, always.
>
> Okay, I hope you like the attached version better.

Yes, thanks. Are the casts to float necessary in the initializer for
first_part_length? The 1.0 means the result will be a double anyway.
Should that be 1.0f, or are the (float) casts just visual noise?

In any case, I think it's OK for mainline, thanks.

Jonathan
Johannes Singler - June 8, 2010, 9:27 a.m.
Jonathan Wakely wrote:
> On 8 June 2010 09:05, Johannes Singler wrote:
>> Jonathan Wakely wrote:
>>> On 8 June 2010 08:21, Johannes Singler wrote:
>>>> This patch corrects the calculation of the part lengths for parallel
>>>> partial_sum, leading to the expected behavior for
>>>> partial_sum_dilation!=1, and thus better performance.
>>>>
>>>> Tested x86_64-unknown-linux-gnu: No regressions.
>>>>
>>>> Please approve for mainline.
>>>>
>>>> 2010-06-08  Johannes Singler  <singler@kit.edu>
>>>>
>>>>        * include/parallel/partial_sum.h
>>>>        (__parallel_partial_sum_linear):
>>>>        Correctly calculate part lengths for partial_sum_dilation!=1.
>>> Please declare each variable on its own line, as per
>>> http://gcc.gnu.org/onlinedocs/libstdc++/manual/source_code_style.html
>>>
>>>       char* c = "abc";  // each variable goes on its own line, always.
>> Okay, I hope you like the attached version better.
> 
> Yes, thanks. Are the casts to float necessary in the initializer for
> first_part_length? The 1.0 means the result will be a double anyway.
> Should that be 1.0f, or are the (float) casts just visual noise?

Good point, I have cleaned that up, and commited.

Johannes

Patch

Index: include/parallel/partial_sum.h
===================================================================
--- include/parallel/partial_sum.h	(revision 160424)
+++ include/parallel/partial_sum.h	(working copy)
@@ -127,10 +127,13 @@ 
 	    equally_split(__n, __num_threads + 1, __borders);
 	  else
 	    {
+	      _DifferenceType __first_part_length =
+		  std::max<_DifferenceType>(1, (float)__n /
+		    (1.0 + __s.partial_sum_dilation * (float)__num_threads));
 	      _DifferenceType __chunk_length =
-		((double)__n
-		 / ((double)__num_threads + __s.partial_sum_dilation)),
-		__borderstart = __n - __num_threads * __chunk_length;
+		  (__n - __first_part_length) / __num_threads;
+	      _DifferenceType __borderstart =
+		  __n - __num_threads * __chunk_length;
 	      __borders[0] = 0;
 	      for (_ThreadIndex __i = 1; __i < (__num_threads + 1); ++__i)
 		{