From patchwork Tue Apr 10 14:53:32 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Matz X-Patchwork-Id: 151608 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id C31C0B7024 for ; Wed, 11 Apr 2012 00:53:54 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1334674436; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Date: From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=8maJIb8DpWM8XkkZG5lJ /SgNW/Y=; b=hBwVaAo6oJv9C8QwxgOIQqZQKjYtRYN5Eqb0mVaZNXOIDZHjhEtk XaKWznppC2dl/EYeH63ou+MsJR975GIaEwK7C50PeC2OTvmRQzlIOSy/m9pzWjOG hAirezqzzWVXBefityLUAubRiQ9LicT/sdsrByukVMKptgmtXScvTBc= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=bR8ty4wLEUulpT1Amgb3N3ogX/mk8P6UvzsftUp1OH/y22dMWCTmmV78kUeO0/ FD4ZdPPuVnRZZ1jSfXgFuTfNtpuGDe/DG9n1sM3UklCQfyZN4i4hfC8VQKx1XYbo msEA/4ynR/0sXxSkfPVD9LOxUx7idS0mmYQ5+6b6gg9ZQ=; Received: (qmail 31099 invoked by alias); 10 Apr 2012 14:53:48 -0000 Received: (qmail 31083 invoked by uid 22791); 10 Apr 2012 14:53:47 -0000 X-SWARE-Spam-Status: No, hits=-5.1 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 10 Apr 2012 14:53:33 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 34ED793583; Tue, 10 Apr 2012 16:53:32 +0200 (CEST) Date: Tue, 10 Apr 2012 16:53:32 +0200 (CEST) From: Michael Matz To: gcc-patches@gcc.gnu.org Cc: fortran@gcc.gnu.org Subject: Guard use of modulo in cshift (speedup protein) Message-ID: MIME-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, this patch speeds up polyhedrons protein on Bulldozer quite a bit. The things is that in this testcase cshift is called with a very short length (<=3) and that the shift amount always is less than the length. Surprisingly the division instruction takes up considerable amount of time, so much that it makes sense to guard it, when the shift is in bound. Here's some oprofile of _gfortrani_cshift0_i4 (total 31020 cycles): 23 0.0032 : caf00: idiv %r13 13863 1.9055 : caf03: lea (%rdx,%r13,1),%r12 I.e. despite the memory shuffling one third of the cshift cycles are that division. With the patch the time for protein drops from 0m21.367s to 0m20.547s on this Bulldozer machine. I've checked that it has no adverse effect on older AMD or Intel cores (0:44.30elapsed vs 0:44.00elapsed, still an improvement). Regstrapped on x86_64-linux. Okay for trunk? Ciao, Michael. * m4/cshift0.m4 (cshift0_'rtype_code`): Guard use of modulo. * generated/cshift0_c10.c: Regenerated. * generated/cshift0_c16.c: Regenerated. * generated/cshift0_c4.c: Regenerated. * generated/cshift0_c8.c: Regenerated. * generated/cshift0_i16.c: Regenerated. * generated/cshift0_i1.c: Regenerated. * generated/cshift0_i2.c: Regenerated. * generated/cshift0_i4.c: Regenerated. * generated/cshift0_i8.c: Regenerated. * generated/cshift0_r10.c: Regenerated. * generated/cshift0_r16.c: Regenerated. * generated/cshift0_r4.c: Regenerated. * generated/cshift0_r8.c: Regenerated. Index: m4/cshift0.m4 =================================================================== --- m4/cshift0.m4 (revision 186272) +++ m4/cshift0.m4 (working copy) @@ -98,9 +98,13 @@ cshift0_'rtype_code` ('rtype` *ret, cons rptr = ret->base_addr; sptr = array->base_addr; - shift = len == 0 ? 0 : shift % (ptrdiff_t)len; - if (shift < 0) - shift += len; + /* Avoid the costly modulo for trivially in-bound shifts. */ + if (shift < 0 || shift >= len) + { + shift = len == 0 ? 0 : shift % (ptrdiff_t)len; + if (shift < 0) + shift += len; + } while (rptr) {