From patchwork Mon Oct 21 16:32:37 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 285231 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 7BC602C0087 for ; Tue, 22 Oct 2013 03:32:50 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=rd6Zw1cSFfHeXFDX1UNV25/v0nOcVVXs+Zb+fXGoqVuquU2Nmx RhOLwtCliWrw3wRI4hgDIIFZoXXWGlXMVOJAu0eEQU+qb3sed5dHwFL/50obezoe Jmctan5Q/9GMue+nU9CMu3fEhwqLvGBBR53rRbZ4huDpzKxHP4vxClftI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=vtd/rsi9yln1ES2JDojkm9q2m1s=; b=L8rgnM54H5r4P/R/f9Ph XBY1r21LV6Cz2fuGFz1b+xj01JNZmyjdmcQ9EaP/r5ZK+LsgZhY97UzKtf7OmxH8 51z9pSFVU4IE7wootegE80X6la7/4M6PytI+lAmJGs5UXabkqK7ZmujC5mDYV41o vm8Fjb9sUyLvJB+Wz4BZw34= Received: (qmail 8652 invoked by alias); 21 Oct 2013 16:32:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8635 invoked by uid 89); 21 Oct 2013 16:32:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_20 autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 21 Oct 2013 16:32:40 +0000 Received: from relay1.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E7D47A6382; Mon, 21 Oct 2013 18:32:37 +0200 (CEST) Date: Mon, 21 Oct 2013 18:32:37 +0200 From: Martin Jambor To: GCC Patches Cc: Vladimir Makarov Subject: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping Message-ID: <20131021163237.GA11578@virgil.suse> Mail-Followup-To: GCC Patches , Vladimir Makarov MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Hi, in spring I have suggested to shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue in order to create many more shrink-wrapping opportunities. The problem is that formal arguments of a functions which need to be saved across calls on slow paths often get assigned callee saved registers and the very first BB thus needs prologue, which makes any attempt at shrink-wrapping difficult. Steven suggested that splitting live-ranges of these registers might be a better approach (and Vlad suggested that I should look at http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01137.html, a big thanks to both) and this is what the patch below does. The basic idea is to split live ranges of problematic pseudos in the common dominator of all non-sibling calls and all uses of those pseudos in BBs which are reachable from calls, if such dominator is not in a loop and if it does not post-dominate the entry BB. As far as the number of performed shrink-wrappings is concerned, it has even bigger effect to scheduling pass_cprop_hardreg earlier that I have proposed in spring. Apart from looking at bootstrap and SPEC 2006 (at -Ofast) I have tried compiling Python and Ruby (with whatever their default flags were, I believe mostly -O3) because David suggested at his Cauldron talk last year that shrink-wrapping may be important: Number of performed shrink-wrappings: | Source | Trunk | With patch | % | |-------------------------+-------+------------+---------| | SPEC 2006 Int | 734 | 1670 | +127.52 | | SPEC 2006 FP (*) | 575 | 1022 | +77.74 | | C/CPP bootstrap stage 2 | 1748 | 3015 | +72.48 | | Python 2.7.5 | 231 | 401 | +73.59 | | Python 3.3.2 | 234 | 416 | +77.78 | | Ruby 2.0.0-p247 | 331 | 464 | +40.18 | (*) I forgot to increase the stack limit and I believe that is the reason why 410.bwaves, 416.gamess, 447.dealII and 481.wrf SPEC 2006 FP benchmarks failed, regardless of my patch, but I have not investigated what exactly happened, so there might have even been compile-time issues. As far as run-times are concerned, I have tried running the benchmarks from hg.python.org/benchmarks but I do not have anything to report. David claimed that shrink-wrapping in function lookdict_string is likely to help but this patch alone does not make that happen, I am investigating what else needs to be done for this to happen. I have not yet tried benchmarking ruby. Spec 2006 (*) run times are a bit encouraging. Run-times (seconds, median of three runs): | Benchmark | Trunk | Live range splitting | % | |----------------+-------+----------------------+-------| | 400.perlbench | 294 | 290 | -1.36 | | 401.bzip2 | 384 | 387 | +0.78 | | 403.gcc | 239 | 237 | -0.84 | | 429.mcf | 227 | 228 | +0.44 | | 445.gobmk | 392 | 390 | -0.51 | | 456.hmmer | 343 | 342 | -0.29 | | 458.sjeng | 418 | 411 | -1.67 | | 462.libquantum | 287 | 288 | +0.35 | | 464.h264ref | 417 | 418 | +0.24 | | 471.omnetpp | 275 | 280 | +1.82 | | 473.astar | 312 | 295 | -5.45 | | 483.xalancbmk | 183 | 183 | 0.00 | |----------------+-------+----------------------+-------| | 433.milc | 336 | 335 | -0.30 | | 434.zeusmp | 247 | 246 | -0.40 | | 435.gromacs | 259 | 259 | 0.00 | | 436.cactusADM | 192 | 186 | -3.12 | | 437.leslie3d | 227 | 226 | -0.44 | | 444.namd | 315 | 315 | 0.00 | | 450.soplex | 202 | 200 | -0.99 | | 453.povray | 136 | 134 | -1.47 | | 454.calculix | 300 | 300 | 0.00 | | 459.GemsFDTD | 273 | 267 | -2.20 | | 465.tonto | 685 | 685 | 0.00 | | 470.lbm | 212 | 213 | +0.47 | | 482.sphinx3 | 363 | 360 | -0.83 | The patch is speculative, quite a few modifications do not lead to shrink wrappings, as outlined in the following table which shows how many of the changed functions still needed their prologue in the first BB. Number of functions touched: | | Modified | | | | Source | functions | In vain | % | |-------------------------+-----------+---------+-------| | SPEC 2006 Int | 2004 | 745 | 37.18 | | SPEC 2006 FP (*) | 1491 | 881 | 59.09 | | C/C++ bootstrap stage 2 | 2864 | 935 | 32.65 | | Python 2.7.5 | 351 | 121 | 34.47 | | Python 3.3.2 | 410 | 151 | 36.83 | | Ruby 2.0.0-p247 | 336 | 132 | 39.29 | I believe we can do even more shrink-wrappings and thus make the above percentages smaller if attempt to fix a similar problem with the return value. The parameters are however clearly the biggest obstacle and need to be dealt with somehow. The patch also adds quite some more computations, especially for functions which are not rejected early on. The last table below summarizes how much extra work is done. Total is the number of functions on which split_live_ranges_for_shrink_wrap was called, IIUC function ira, that excludes large functions. Number of functions for which: 1 - all BBs were scanned for calls 2 - the first BB was scanned for register moves 3 - dominators were calculated 4 - loops were calculated 5 - post-dominators were calculated 6 - which were modified | Source | Total | 1 | % | 2 | % | 3 | % | 4 | % | 5 | % | 6 | % | |-------------------------+-------+-------+-------+-------+-------+------+-------+------+-------+------+-------+------+------| | SPEC 2006 Int | 30816 | 11888 | 38.58 | 9902 | 32.13 | 7889 | 25.60 | 2911 | 9.45 | 2692 | 8.74 | 2004 | 6.50 | | SPEC 2006 FP (*) | 17957 | 9239 | 51.45 | 7016 | 39.07 | 6507 | 36.24 | 3640 | 20.27 | 3496 | 19.47 | 1491 | 8.30 | | C/C++ bootstrap stage 2 | 48163 | 19756 | 41.02 | 16211 | 33.66 | 9598 | 19.93 | 4535 | 9.42 | 4174 | 8.67 | 2864 | 5.95 | | Python 2.7.5 | 5472 | 2911 | 53.20 | 2437 | 44.54 | 1987 | 36.31 | 632 | 11.55 | 587 | 10.73 | 351 | 6.41 | | Python 3.3.2 | 6156 | 3246 | 52.73 | 2656 | 43.14 | 2165 | 35.17 | 669 | 10.87 | 629 | 10.22 | 410 | 6.66 | | Ruby 2.0.0-p247 | 7866 | 3834 | 48.74 | 3333 | 42.37 | 2588 | 32.90 | 797 | 10.13 | 731 | 9.29 | 336 | 4.27 | The patch itself is below. It passes bootstrap and testing on x86_64-linux. Because it is basically my first RTL-optimization patch, I expect loads of comments and requests for changes. Nevertheless, I think it would be nice to have it (or some later version) in the upcoming release. Thanks, Martin 2013-10-18 Martin Jambor PR rtl-optimization/10474 * ira.c (interesting_dest_for_shprep): New function. (split_live_ranges_for_shrink_wrap): Likewise. (ira): Call split_live_ranges_for_shrink_wrap. testsuite/ * gcc.dg/pr10474.c: New testcase. * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise. * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise. diff --git a/gcc/ira.c b/gcc/ira.c index 203fbff..fe208e2 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -4314,6 +4314,197 @@ find_moveable_pseudos (void) free_dominance_info (CDI_DOMINATORS); } + +/* If insn is interesting for parameter range-splitting shring-wrapping + preparation, i.e. it is a single set from a hard register to a pseudo, which + is live at CALL_DOM, return the destination. Otherwise return NULL. */ + +static rtx +interesting_dest_for_shprep (rtx insn, basic_block call_dom) +{ + rtx set = single_set (insn); + if (!set) + return NULL; + rtx src = SET_SRC (set); + rtx dest = SET_DEST (set); + if (!REG_P (src) || !HARD_REGISTER_P (src) + || !REG_P (dest) || HARD_REGISTER_P (dest) + || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest)))) + return NULL; + return dest; +} + +/* Split live ranges of pseudos that are loaded from hard registers in the + first BB in a BB that dominates all non-sibling call if such a BB can be + found and is not in a loop. */ + +static void +split_live_ranges_for_shrink_wrap (void) +{ + basic_block bb, call_dom = NULL; + basic_block first = single_succ (ENTRY_BLOCK_PTR); + rtx insn, last_interesting_insn = NULL; + bitmap_head need_new, reachable; + vec queue; + + if (!flag_shrink_wrap) + return; + + bitmap_initialize (&need_new, 0); + bitmap_initialize (&reachable, 0); + queue.create (n_basic_blocks); + + FOR_EACH_BB (bb) + FOR_BB_INSNS (bb, insn) + if (CALL_P (insn) && !SIBLING_CALL_P (insn)) + { + if (bb == first) + { + bitmap_clear (&need_new); + bitmap_clear (&reachable); + queue.release (); + return; + } + + bitmap_set_bit (&need_new, bb->index); + bitmap_set_bit (&reachable, bb->index); + queue.quick_push (bb); + break; + } + + if (queue.is_empty ()) + { + bitmap_clear (&need_new); + bitmap_clear (&reachable); + queue.release (); + return; + } + + while (!queue.is_empty ()) + { + edge e; + edge_iterator ei; + + bb = queue.pop (); + FOR_EACH_EDGE (e, ei, bb->succs) + if (e->dest != EXIT_BLOCK_PTR + && bitmap_set_bit (&reachable, e->dest->index)) + queue.quick_push (e->dest); + } + queue.release (); + + FOR_BB_INSNS (first, insn) + { + rtx dest = interesting_dest_for_shprep (insn, NULL); + if (!dest) + continue; + + if (DF_REG_DEF_COUNT (REGNO (dest)) > 1) + { + bitmap_clear (&need_new); + bitmap_clear (&reachable); + return; + } + + for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest)); + use; + use = DF_REF_NEXT_REG (use)) + { + if (NONDEBUG_INSN_P (DF_REF_INSN (use)) + && GET_CODE (DF_REF_REG (use)) == SUBREG) + { + /* This is necessary to avoid hitting an assert at + postreload.c:2294 in libstc++ testcases on x86_64-linux. I'm + not really sure what the probblem actually is there. */ + bitmap_clear (&need_new); + bitmap_clear (&reachable); + return; + } + + rtx uin = DF_REF_INSN (use); + int ubbi = BLOCK_FOR_INSN (uin)->index; + if (bitmap_bit_p (&reachable, ubbi)) + bitmap_set_bit (&need_new, ubbi); + } + last_interesting_insn = insn; + } + + bitmap_clear (&reachable); + if (!last_interesting_insn) + { + bitmap_clear (&need_new); + return; + } + + calculate_dominance_info (CDI_DOMINATORS); + call_dom = nearest_common_dominator_for_set (CDI_DOMINATORS, &need_new); + bitmap_clear (&need_new); + if (call_dom == first) + goto out; + + loop_optimizer_init (AVOID_CFG_MODIFICATIONS); + while (bb_loop_depth (call_dom) > 0) + call_dom = get_immediate_dominator (CDI_DOMINATORS, call_dom); + loop_optimizer_finalize (); + + if (call_dom == first) + goto out; + + calculate_dominance_info (CDI_POST_DOMINATORS); + if (dominated_by_p (CDI_POST_DOMINATORS, first, call_dom)) + { + free_dominance_info (CDI_POST_DOMINATORS); + goto out; + } + free_dominance_info (CDI_POST_DOMINATORS); + + if (dump_file) + fprintf (dump_file, "Will split live ranges of parameters at BB %i\n", + call_dom->index); + + FOR_BB_INSNS (first, insn) + { + rtx dest = interesting_dest_for_shprep (insn, call_dom); + if (!dest) + continue; + + rtx newreg = NULL_RTX; + df_ref use, next; + for (use = DF_REG_USE_CHAIN (REGNO(dest)); use; use = next) + { + rtx uin = DF_REF_INSN (use); + next = DF_REF_NEXT_REG (use); + + basic_block ubb = BLOCK_FOR_INSN (uin); + if (ubb == call_dom + || dominated_by_p (CDI_DOMINATORS, ubb, call_dom)) + { + if (!newreg) + newreg = ira_create_new_reg (dest); + validate_change (uin, DF_REF_LOC (use), newreg, true); + } + } + + if (newreg) + { + rtx new_move = gen_move_insn (newreg, dest); + emit_insn_after (new_move, bb_note (call_dom)); + if (dump_file) + { + fprintf (dump_file, "Split live-range of register "); + print_rtl_single (dump_file, dest); + } + } + + if (insn == last_interesting_insn) + break; + } + apply_change_group(); + + out: + free_dominance_info (CDI_DOMINATORS); +} + /* Perform the second half of the transformation started in find_moveable_pseudos. We look for instances where the newly introduced pseudo remains unallocated, and remove it by moving the definition to @@ -4522,7 +4713,10 @@ ira (FILE *f) allocation because of -O0 usage or because the function is too big. */ if (ira_conflicts_p) - find_moveable_pseudos (); + { + find_moveable_pseudos (); + split_live_ranges_for_shrink_wrap (); + } max_regno_before_ira = max_reg_num (); ira_setup_eliminable_regset (true); diff --git a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c new file mode 100644 index 0000000..fe497c2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue" } */ + +int __attribute__((noinline, noclone)) +foo (int a) +{ + return a + 5; +} + +static int g; + +int __attribute__((noinline, noclone)) +bar (int a) +{ + int r; + + if (a) + { + r = foo (a); + g = r + a; + } + else + r = a+1; + return r; +} + +/* { dg-final { scan-rtl-dump "Will split live ranges of parameters" "ira" } } */ +/* { dg-final { scan-rtl-dump "Split live-range of register" "ira" } } */ +/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" } } */ +/* { dg-final { cleanup-rtl-dump "ira" } } */ +/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */ diff --git a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c new file mode 100644 index 0000000..872a757 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue" } */ + +int __attribute__((noinline, noclone)) +foo (int a) +{ + return a + 5; +} + +static int g; + +int __attribute__((noinline, noclone)) +bar (int a) +{ + int r; + + if (a) + { + r = a; + while (r < 500) + if (r % 2) + r = foo (r); + else + r = foo (r+1); + g = r + a; + } + else + r = g+1; + return r; +} + +/* { dg-final { scan-rtl-dump "Will split live ranges of parameters" "ira" } } */ +/* { dg-final { scan-rtl-dump "Split live-range of register" "ira" } } */ +/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" } } */ +/* { dg-final { cleanup-rtl-dump "ira" } } */ +/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */ diff --git a/gcc/testsuite/gcc.dg/pr10474.c b/gcc/testsuite/gcc.dg/pr10474.c new file mode 100644 index 0000000..ee085c3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr10474.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-rtl-pro_and_epilogue" } */ + +void f(int *i) +{ + if (!i) + return; + else + { + __builtin_printf("Hi"); + *i=0; + } +} + +/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" } } */ +/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */