From patchwork Fri Sep 2 18:20:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelis X-Patchwork-Id: 665283 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sQnW7145tz9svs for ; Sat, 3 Sep 2016 04:21:06 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=U4y311Uy; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=NdqBuS+vYwjKxfpw4 ZsocwKIeWLnW3CfCNzFa6XhePHDDC8NDd62FA2wgYLBDNxfB78FhLQ5zpJkFPTZ0 ilxjD8bhKeS3lYR9KngQCjxUGBuUz26X8KNxuLLaSfpF5R70VE7q8eWBzQNtPbVL CtGSVQd2ORKOGd67SoSfz8Vsag= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=fgKeXGObJ5cduYLhthooIE8 RxCI=; b=U4y311UytLkEALmDp9KF13c7iqUKfjbAfXiXO3TNEG4CmHW9bFJMROq LoJp+GXm6hCHC6FTKEMIox1lfMRb6l1kKb6j1C9p4djXQgaNyLoL4OsKHvpsTxO/ AwYXGwPFbXx7tueG9sB4Nh4GBELAdhUTtEpXKoHNy9AZ/3t2zrac= Received: (qmail 17617 invoked by alias); 2 Sep 2016 18:20:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 17598 invoked by uid 89); 2 Sep 2016 18:20:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=shave, successive, (unknown), __d X-Spam-User: qpsmtpd, 2 recipients X-HELO: smtpq4.tb.mail.iss.as9143.net Received: from smtpq4.tb.mail.iss.as9143.net (HELO smtpq4.tb.mail.iss.as9143.net) (212.54.42.167) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 02 Sep 2016 18:20:56 +0000 Received: from [212.54.42.117] (helo=lsmtp3.tb.mail.iss.as9143.net) by smtpq4.tb.mail.iss.as9143.net with esmtp (Exim 4.82) (envelope-from ) id 1bft5B-0005SL-6H; Fri, 02 Sep 2016 20:20:53 +0200 Received: from i23017.upc-i.chello.nl ([62.195.23.17] helo=[192.168.0.16]) by lsmtp3.tb.mail.iss.as9143.net with esmtpa (Exim 4.82) (envelope-from ) id 1bft5B-0002U1-3k; Fri, 02 Sep 2016 20:20:53 +0200 Subject: Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible To: Jonathan Wakely References: <5728B8CC.7030405@eelis.net> <20160831124502.GH3342@redhat.com> Cc: libstdc++ , gcc-patches From: Eelis van der Weegen Message-ID: <57C9C304.90607@eelis.net> Date: Fri, 2 Sep 2016 20:20:52 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160831124502.GH3342@redhat.com> X-Ziggo-spambar: / X-Ziggo-spamscore: 0.0 X-Ziggo-spamreport: CMAE Analysis: v=2.1 cv=PcUC/XVd c=1 sm=0 tr=0 a=9+rZDBEiDlHhcck0kWbJtElFXBc=:19 a=GW1xBdLrtEIA:10 a=r77TgQKjGQsHNAKrUKIA:9 a=jTolifSA960tx58dAYkA:9 a=pILNOxqGKmIA:10 a=3bsvoFs8WQOPrlJBYbQA:9 a=5NaJI7ZxtiFyhjtr:21 a=d-cz1USqJ0Aebobo:21 xcat=Undefined/Undefined none X-Ziggo-Spam-Status: No On 2016-08-31 14:45, Jonathan Wakely wrote: > Is this significantly faster than just using > uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't > need to duplicate the logic? (And people maintaining the code won't > reconvince themselves it's correct every time they look at it :-) > >[..] > > Could we hoist this test out of the loop somehow? > > If we change the loop condition to be __i+1 < __last we don't need to > test it on every iteration, and then after the loop we can just do > the final swap if (__urange % 2). Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below. I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay! Updated patch attached. Index: libstdc++-v3/include/bits/stl_algo.h =================================================================== --- libstdc++-v3/include/bits/stl_algo.h (revision 239895) +++ libstdc++-v3/include/bits/stl_algo.h (working copy) @@ -3772,6 +3772,46 @@ typedef typename std::make_unsigned<_DistanceType>::type __ud_type; typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; + + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; + typedef typename std::common_type::type __uc_type; + + const __uc_type __urngrange = __g.max() - __g.min(); + const __uc_type __urange = __uc_type(__last - __first); + + if (__urngrange / __urange >= __urange) + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. + { + _RandomAccessIterator __i = __first + 1; + + // Since we know the range isn't empty, an even number of elements + // means an uneven number of elements /to swap/, in which case we + // do the first one up front: + + if ((__urange % 2) == 0) + { + std::iter_swap(__i++, __first + (__g() & 1)); + } + + // Now we know that __last - __i is even, so we do the rest in pairs, + // using a single distribution invocation to produce swap positions + // for two successive elements at a time: + + while (__i != __last) + { + const __uc_type __swap_range = __uc_type(__i - __first) + 1; + const __uc_type __comp_range = __swap_range * (__swap_range + 1); + + std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1}; + const __uc_type __pospos = __d(__g); + + std::iter_swap(__i++, __first + (__pospos % __swap_range)); + std::iter_swap(__i++, __first + (__pospos / __swap_range)); + } + + return; + } + __distr_type __d; for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)