diff mbox

[libstdc++-v3,parallel,mode] Correct part lengths calculation for parallel partial_sum

Message ID 4C0DEF94.5020404@kit.edu
State New
Headers show

Commit Message

Johannes Singler June 8, 2010, 7:21 a.m. UTC
This patch corrects the calculation of the part lengths for parallel
partial_sum, leading to the expected behavior for
partial_sum_dilation!=1, and thus better performance.

Tested x86_64-unknown-linux-gnu: No regressions.

Please approve for mainline.

2010-06-08  Johannes Singler  <singler@kit.edu>

        * include/parallel/partial_sum.h
        (__parallel_partial_sum_linear):
        Correctly calculate part lengths for partial_sum_dilation!=1.

Johannes

Comments

Jonathan Wakely June 8, 2010, 7:39 a.m. UTC | #1
On 8 June 2010 08:21, Johannes Singler wrote:
> This patch corrects the calculation of the part lengths for parallel
> partial_sum, leading to the expected behavior for
> partial_sum_dilation!=1, and thus better performance.
>
> Tested x86_64-unknown-linux-gnu: No regressions.
>
> Please approve for mainline.
>
> 2010-06-08  Johannes Singler  <singler@kit.edu>
>
>        * include/parallel/partial_sum.h
>        (__parallel_partial_sum_linear):
>        Correctly calculate part lengths for partial_sum_dilation!=1.

Please declare each variable on its own line, as per
http://gcc.gnu.org/onlinedocs/libstdc++/manual/source_code_style.html

      char* c = "abc";  // each variable goes on its own line, always.

(Unfortunately the coding style guide seems to have got mangled by the
conversion to docbook, I'll fix that.)
diff mbox

Patch

Index: include/parallel/partial_sum.h
===================================================================
--- include/parallel/partial_sum.h	(revision 160253)
+++ include/parallel/partial_sum.h	(working copy)
@@ -127,10 +127,12 @@ 
 	    equally_split(__n, __num_threads + 1, __borders);
 	  else
 	    {
-	      _DifferenceType __chunk_length =
-		((double)__n
-		 / ((double)__num_threads + __s.partial_sum_dilation)),
-		__borderstart = __n - __num_threads * __chunk_length;
+	      _DifferenceType
+		  __first_part_length = std::max<_DifferenceType>(1,
+		    (float)__n /
+		    (1.0 + __s.partial_sum_dilation * (float)__num_threads)),
+		  __chunk_length = (__n - __first_part_length) / __num_threads,
+		  __borderstart = __n - __num_threads * __chunk_length;
 	      __borders[0] = 0;
 	      for (_ThreadIndex __i = 1; __i < (__num_threads + 1); ++__i)
 		{