From patchwork Tue Jun 8 14:07:32 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Singler X-Patchwork-Id: 54979 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id EC770B7D47 for ; Wed, 9 Jun 2010 00:07:56 +1000 (EST) Received: (qmail 3203 invoked by alias); 8 Jun 2010 14:07:52 -0000 Received: (qmail 3179 invoked by uid 22791); 8 Jun 2010 14:07:50 -0000 X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from iramx2.ira.uni-karlsruhe.de (HELO iramx2.ira.uni-karlsruhe.de) (141.3.10.81) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 08 Jun 2010 14:07:41 +0000 Received: from irams1.ira.uni-karlsruhe.de ([141.3.10.5]) by iramx2.ira.uni-karlsruhe.de with esmtps port 25 id 1OLzSn-0006PU-6E; Tue, 08 Jun 2010 16:07:38 +0200 Received: from i10pc67.iti.kit.edu ([141.3.24.67]) by irams1.ira.uni-karlsruhe.de with esmtps port 465 id 1OLzSn-0001Fg-1V; Tue, 08 Jun 2010 16:07:33 +0200 Message-ID: <4C0E4EA4.8080904@kit.edu> Date: Tue, 08 Jun 2010 16:07:32 +0200 From: Johannes Singler User-Agent: Thunderbird 2.0.0.24 (X11/20100302) MIME-Version: 1.0 To: libstdc++ , gcc-patches@gcc.gnu.org Subject: [PATCH][libstdc++-v3 parallel mode] Improve find algorithm X-ATIS-AV: ClamAV (irams1.ira.uni-karlsruhe.de) X-ATIS-AV: ClamAV (iramx2.ira.uni-karlsruhe.de) X-ATIS-AV: Kaspersky (iramx2.ira.uni-karlsruhe.de) X-ATIS-Timestamp: iramx2.ira.uni-karlsruhe.de 1276006059.063701000 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This patch improves the parallel find algorithm by making the block size proportional to the current start position, which gives nice theoretical performance guarantees and better performance in practice. Tested x86_64-unknown-linux-gnu: No regressions Please approve for mainline. 2010-06-08 Johannes Singler * include/parallel/find.h (__find_template(.., growing_blocks_tag)): Make block size proportional to current position. * include/parallel/settings.h (_Settings): Introduce new tuning parameter find_scale_factor to the end of the struct, default to 0.01. Johannes Index: include/parallel/settings.h =================================================================== --- include/parallel/settings.h (revision 160424) +++ include/parallel/settings.h (working copy) @@ -272,6 +272,9 @@ /// Minimal input size for search and search_n. _SequenceIndex search_minimal_n; + /// Block size scale-down factor with respect to current position. + float find_scale_factor; + /// Get the global settings. _GLIBCXX_CONST static const _Settings& get() throw(); @@ -331,7 +334,8 @@ TLB_size(128), cache_line_size(64), qsb_steals(0), - search_minimal_n(1000) + search_minimal_n(1000), + find_scale_factor(0.01f) { } }; } Index: include/parallel/find.h =================================================================== --- include/parallel/find.h (revision 160424) +++ include/parallel/find.h (working copy) @@ -168,9 +168,7 @@ * @param __selector _Functionality (e. g. std::find_if(), std::equal(),...) * @return Place of finding in both sequences. * @see __gnu_parallel::_Settings::find_sequential_search_size - * @see __gnu_parallel::_Settings::find_initial_block_size - * @see __gnu_parallel::_Settings::find_maximum_block_size - * @see __gnu_parallel::_Settings::find_increasing_factor + * @see __gnu_parallel::_Settings::find_scale_factor * * There are two main differences between the growing blocks and * the constant-size blocks variants. @@ -218,6 +216,8 @@ omp_lock_t __result_lock; omp_init_lock(&__result_lock); + const float __scale_factor = __s.find_scale_factor; + _ThreadIndex __num_threads = __get_max_threads(); # pragma omp parallel shared(__result) num_threads(__num_threads) { @@ -227,7 +227,8 @@ // Not within first __k elements -> start parallel. _ThreadIndex __iam = omp_get_thread_num(); - _DifferenceType __block_size = __s.find_initial_block_size; + _DifferenceType __block_size = + std::max<_DifferenceType>(1, __scale_factor * __next_block_start); _DifferenceType __start = __fetch_and_add<_DifferenceType> (&__next_block_start, __block_size); @@ -265,15 +266,14 @@ omp_unset_lock(&__result_lock); } - __block_size = std::min<_DifferenceType> - (__block_size * __s.find_increasing_factor, - __s.find_maximum_block_size); + _DifferenceType __block_size = + std::max<_DifferenceType>(1, __scale_factor * __next_block_start); // Get new block, update pointer to next block. __start = __fetch_and_add<_DifferenceType>(&__next_block_start, __block_size); - __stop = (__length < (__start + __block_size) - ? __length : (__start + __block_size)); + __stop = + std::min<_DifferenceType>(__length, __start + __block_size); } } //parallel