From patchwork Wed Apr 17 15:49:35 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 237270 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id E9A142C0151 for ; Thu, 18 Apr 2013 01:49:45 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=UhsngFlkVPfVoANgeNDUAbRF8zDjmzNu/eQs1cZl23mGqndUjmjYT Kh8CJY6m6To2YPN6j1zwB22ja6YBpD71pnn1WjxCN1io4rsfKLBOaBVIB7m31fcY oOYYTiN4a+A3hNfiMT3bHO9MO6mqhg4utDLUw/8IFp8I5WtFFMokms= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=tDyeIVMeW8XOleDMbhJQea+kEEs=; b=YJuyLUSsUFxqcM0WGR7+ 5an9gaaS9EgbmJzibSjpnAaNnFLZv8dNgNwXQ/tDuLy9Awy8cfBZMdEVFhQb7Ea2 a07tbSWFVUqd6/szo6TfO1q+GTXwgLDAj4dLT97t3SJ1AfDRM6ZYlhTZSMB8dwX7 Q02GkjaTDCxhfxJVJPXLQzQ= Received: (qmail 12535 invoked by alias); 17 Apr 2013 15:49:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 12526 invoked by uid 89); 17 Apr 2013 15:49:39 -0000 X-Spam-SWARE-Status: No, score=-5.5 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Wed, 17 Apr 2013 15:49:38 +0000 Received: from relay1.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 6F8AEA52BF for ; Wed, 17 Apr 2013 17:49:36 +0200 (CEST) Date: Wed, 17 Apr 2013 17:49:35 +0200 From: Martin Jambor To: GCC Patches Subject: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue Message-ID: <20130417154935.GC3656@virgil.suse> Mail-Followup-To: GCC Patches MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Hi, I have discovered that scheduling pass_cprop_hardreg before pass_thread_prologue_and_epilogue leads to significant increases in numbers of performed shrink-wrappings. For one it solves PR 10474 (at least on x86_64-linux) but it also boosts the number of shrink-wrappings performed during gcc bootstrap by nearly 80% (3165->5692 functions). It is also necessary (although not sufficient) to perform shrink-wrapping in at least one function in the povray benchmark. The reason why it helps so much is that before register allocation there are instructions moving the value of actual arguments from "originally hard" register (e.g. SI, DI, etc.) to a pseudo at the beginning of each function. When the argument is live across a function call, the pseudo is likely to be assigned to a callee-saved register and then also accessed from that register, even in the first BB, making it require prologue, though it could be fetched from the original one. When we convert all uses (at least in the first BB) to the original register, the preparatory stage of shrink wrapping is often capable of moving the register moves to a later BB, thus creating fast paths which do not require prologue and epilogue. We believe this change in the pipeline should not bring about any negative effects. During gcc bootstrap, the number of instructions changed by pass_cprop_hardreg dropped but by only 1.2%. We have also ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all run time differences could be attributed to noise. The changes in binary sizes were also small: | | Trunk produced | New | | | Benchmark | binary size | binary size | % diff | |----------------+----------------+-------------+--------| | 400.perlbench | 6219603 | 6136803 | -1.33 | | 401.bzip2 | 359291 | 351659 | -2.12 | | 403.gcc | 16249718 | 15915774 | -2.06 | | 410.bwaves | 145249 | 145769 | 0.36 | | 416.gamess | 40269686 | 40270270 | 0.00 | | 429.mcf | 97142 | 97126 | -0.02 | | 433.milc | 715444 | 713236 | -0.31 | | 434.zeusmp | 1444596 | 1444676 | 0.01 | | 435.gromacs | 6609207 | 6470039 | -2.11 | | 436.cactusADM | 4571319 | 4532607 | -0.85 | | 437.leslie3d | 492197 | 492357 | 0.03 | | 444.namd | 1001921 | 1007001 | 0.51 | | 445.gobmk | 8193495 | 8163839 | -0.36 | | 450.soplex | 5565070 | 5530734 | -0.62 | | 453.povray | 7468446 | 7340142 | -1.72 | | 454.calculix | 8474754 | 8464954 | -0.12 | | 456.hmmer | 1662315 | 1650147 | -0.73 | | 458.sjeng | 623065 | 620817 | -0.36 | | 459.GemsFDTD | 1456669 | 1461573 | 0.34 | | 462.libquantum | 249809 | 248401 | -0.56 | | 464.h264ref | 2784806 | 2772806 | -0.43 | | 465.tonto | 15511395 | 15480899 | -0.20 | | 470.lbm | 64327 | 64215 | -0.17 | | 471.omnetpp | 5325418 | 5293874 | -0.59 | | 473.astar | 365853 | 363261 | -0.71 | | 481.wrf | 22002287 | 21950783 | -0.23 | | 482.sphinx3 | 1153616 | 1145248 | -0.73 | | 483.xalancbmk | 62458676 | 62001540 | -0.73 | |----------------+----------------+-------------+--------| | TOTAL | 221535374 | 220130550 | -0.63 | I have successfully bootstrapped and tested the patch on x86-64-linux. Is it OK for trunk? Or should I also examine some other aspect? Thanks, Martin 2013-03-28 Martin Jambor PR middle-end/10474 * passes.c (init_optimization_passes): Move pass_cprop_hardreg before pass_thread_prologue_and_epilogue. testsuite/ * gcc.dg/pr10474.c: New test. Index: src/gcc/passes.c =================================================================== --- src.orig/gcc/passes.c +++ src/gcc/passes.c @@ -1630,6 +1630,7 @@ init_optimization_passes (void) NEXT_PASS (pass_ree); NEXT_PASS (pass_compare_elim_after_reload); NEXT_PASS (pass_branch_target_load_optimize1); + NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_thread_prologue_and_epilogue); NEXT_PASS (pass_rtl_dse2); NEXT_PASS (pass_stack_adjustments); @@ -1637,7 +1638,6 @@ init_optimization_passes (void) NEXT_PASS (pass_peephole2); NEXT_PASS (pass_if_after_reload); NEXT_PASS (pass_regrename); - NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_fast_rtl_dce); NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_branch_target_load_optimize2); Index: src/gcc/testsuite/gcc.dg/pr10474.c =================================================================== --- /dev/null +++ src/gcc/testsuite/gcc.dg/pr10474.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-rtl-pro_and_epilogue" } */ + +void f(int *i) +{ + if (!i) + return; + else + { + __builtin_printf("Hi"); + *i=0; + } +} + +/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" } } */ +/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */