From patchwork Thu Oct 11 06:50:07 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bin Cheng X-Patchwork-Id: 190818 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 272732C008B for ; Thu, 11 Oct 2012 17:55:33 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1350543334; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: From:To:Cc:References:In-Reply-To:Subject:Date:Message-ID: MIME-Version:Content-Type:Mailing-List:Precedence:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:Sender: Delivered-To; bh=4WApTwmQIujm5ygoTsakXfmBemY=; b=bQCa72om/YYRnkA hvu5AfqIVHzrr6mJBwjY+AFq6tyuaAz/J2Qww1WUPNTaTaA0pLSn8YiHwLPBk/yb lkxis6rVpu7BpkCSVCc42qUOnByLnaZxbtjPf8/LRmLrioajvgsVD0MbCocjB1UL 9XN96c7AD5TjZd8ffBJbZU5lFGHc= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:From:To:Cc:References:In-Reply-To:Subject:Date:Message-ID:MIME-Version:X-MC-Unique:Content-Type:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=xRM7e+TJl6nMB5fUvU3N15JVb6k2ZQb0HCfeFIpuMCiyA2WiIOYvW36g1n3Yhq BZVLRi7r0vgAm5TfOvi18+5AvS80Oad5vr5z3rKCr6MBaTq32FwgI4Uny0mNgduD HdTidV/XmreY3ELpRrzzzaV2spus9j++l5iIvneKpJ0M4=; Received: (qmail 964 invoked by alias); 11 Oct 2012 06:55:27 -0000 Received: (qmail 953 invoked by uid 22791); 11 Oct 2012 06:55:26 -0000 X-SWARE-Spam-Status: No, hits=-0.6 required=5.0 tests=AWL, BAYES_50, KHOP_RCVD_UNTRUST, KHOP_SPAMHAUS_DROP, KHOP_THREADED, MSGID_MULTIPLE_AT, RCVD_IN_DNSWL_LOW, TW_DB X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 11 Oct 2012 06:55:16 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Thu, 11 Oct 2012 07:55:13 +0100 Received: from Binsh02 ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Thu, 11 Oct 2012 07:55:12 +0100 From: "Bin Cheng" To: "'Jeff Law'" Cc: "'Steven Bosscher'" , References: <50655073.e54c420a.651e.ffffac0fSMTPIN_ADDED@mx.google.com> <00b401cd9e0c$d7982140$86c863c0$@cheng@arm.com> <506AE4F1.5030807@redhat.com> In-Reply-To: <506AE4F1.5030807@redhat.com> Subject: RE: [PATCH RFA] Implement register pressure directed hoist pass Date: Thu, 11 Oct 2012 14:50:07 +0800 Message-ID: <000701cda77c$a8b88260$fa298720$@cheng@arm.com> MIME-Version: 1.0 X-MC-Unique: 112101107551300201 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi Jeff, Steven, This is the updated patch according to your comments. The main changes includes: 1. Enable the option for all target at Os level by default. 2. Add a target independent test case. 3. Add comments on how the algorithm works. 4. Simplify the calculation of register pressure info by using DF caches. Jeff, as for accurately simulating the change of register pressure made by hoisting expression, I explained a little in previous message. According to observation, cases in which register pressure decreases are relatively rare, so this patch does not implement it. I can continue to improve code hoisting that way in the future. I bootstrapped it on x86_64 and x86, also tested on x86_64/x86/arm. I recollected size info for CSiBE on miscellaneous targets as following: improvement ARM 0.12% Thumb1 0.13% mips 0.24% powerpc 0.59% Thumb2 X x86 X x86_64 X (X means no obvious effect, unfortunately.) Also, since this is the simplification of previous patch, I think there is no slowdown in compilation. Please review. Thanks. And updated ChangeLog entry: 2012-10-11 Bin Cheng * common.opt (flag_ira_hoist_pressure): New. * doc/invoke.texi (-fira-hoist-pressure): Describe. * ira-costs.c (ira_set_pseudo_classes): New parameter. * ira.h: Update copyright dates. (ira_set_pseudo_classes): Update prototype. * haifa-sched.c (sched_init): Update call. * ira.c (ira): Update call. * regmove.c: Update copyright dates. (regmove_optimize): Update call. * loop-invariant.c: Update copyright dates. (move_loop_invariants): Update call. * gcse.c: Update copyright dates. (struct bb_data): New structure. (BB_DATA): New macro. (curr_bb, curr_reg_pressure): New static variables. (should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p. Change parameter expr_index to expr. Change parameter visited. New parameters pressure_class, nregs and hoisted_bbs. Use reg pressure to determine the distance expr can be hoisted. (hoist_code): Use reg pressure to direct the hoist process. (get_regno_pressure_class, get_pressure_class_and_nregs) (change_pressure, calculate_bb_reg_pressure): New. (one_code_hoisting_pass): Calculate register pressure. Allocate and free data. gcc/testsuite/ChangeLog 2012-10-11 Bin Cheng * testsuite/gcc.dg/hoist-register-pressure.c: New test. > -----Original Message----- > From: Jeff Law [mailto:law@redhat.com] > Sent: Tuesday, October 02, 2012 8:58 PM > To: Bin Cheng > Cc: 'Steven Bosscher'; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass > > On 09/29/2012 12:37 AM, Bin Cheng wrote: > > Hi Steven, > > > > This is the updated patch according to your comments. Please review. > > I also re-collected code size data and found it is improved by about > > 0.24% for mips, which is better than previous data. I believe this > > should be caused by recent changes in trunk, rather than by using DF > > caches to calculate register pressure. > > > > Thanks. > > > > 2012-09-29 Bin Cheng > > > > * common.opt (flag_ira_hoist_pressure): New. > > * doc/invoke.texi (-fira-hoist-pressure): Describe. > > * ira-costs.c (ira_set_pseudo_classes): New parameter. > > * ira.h (ira_set_pseudo_classes): Update prototype. > > * haifa-sched.c (sched_init): Update call. > > * ira.c (ira): Update call. > > * regmove.c (regmove_optimize): Update call. > > * loop-invariant.c (move_loop_invariants): Update call. > > * gcse.c (struct bb_data): New structure. > > (BB_DATA): New macro. > > (curr_bb, curr_regs_live, curr_reg_pressure, regs_set) > > (n_regs_set): New static variables. > > (hoist_expr_reaches_here_p): Use reg pressure to determine the > > distance expr can be hoisted. > > (hoist_code): Use reg pressure to direct the hoist process. > > (get_regno_pressure_class, get_pressure_class_and_nregs) > > (change_pressure, mark_regno_live, mark_regno_death) > > (mark_reg_death, mark_reg_store, calculate_bb_reg_pressure): New. > > (one_code_hoisting_pass): Calculate register pressure. Free data. > > * config/arm/arm.c (arm_option_override): Set flag_ira_hoist_pressure > > on Thumb1 when optimizing for size. > > > > > > hoist-reg-pressure-20120929.txt > > +@item -fira-hoist-pressure > > +@opindex fira-hoist-pressure > > +Use IRA to evaluate register pressure in hoist pass for decisions to hoist > > +expressions. This option usually results in generation of smaller code on > > +RISC machines, but it can slow the compiler down. > I wouldn't use CISC/RISC here; I'd just say it usually results in > smaller code. > > You need to update the copyright year in gcse.c, ira.h, regmove.c, and > loop-invariant.c. > > > + /* Only decrease distance if bb has high register pressure or EXPR > > + is const expr, otherwise EXPR can be hoisted through bb without > > + cost. */ > ?!? This comment makes no sense to me. To accurately know how hoisting > an expression affects pressure you have to look at the inputs and output > and see how their lifetime has changed. > > In general: > > For inputs, hoisting *may* reduce pressure. You really have to look at > how the life of the input changes based on the new location of the insn. > For example, if the input's lifetime is unchanged (say perhaps because > it was live after the insn we want to hoist), then hoisting will have no > impact. Otherwise the input's life is shortened, but to know by how much > you have to determine whether the new death of the input occurs (it may > still die in the hoisted insn or it may die elsewhere). > > For an output, hoisting will (effectively) always extend the lifetime. > > I've speculated that the right way to deal with register pressure in > code motion is to actually build the dependency graph and use that to > guide the code motions. I've never cobbled together any real code to do > this though. > > Can we find a better name for hoist_expr_reaches_here_p since it's no > longer just dealing with reachability -- it has heuristics now for > profitability as well. > > > @@ -2863,7 +2909,8 @@ static int > > if (visited == NULL) > > { > > visited_allocated_locally = 1; > > - visited = XCNEWVEC (char, last_basic_block); > > + visited = sbitmap_alloc (last_basic_block); > > + sbitmap_zero (visited); > > } > What's the purpose behind changing visited from a simple array to a > sbitmap? I'm not objecting, but would like to hear the rationale behind > that change. I'll also note it wasn't mentioned in the ChangeLog. > > Similarly what's the rationale behind passing the expression itself > rather than just its index? I don't see where we need to use anything > other than the index in this code. And again, this change isn't > mentioned in the ChangeLog. > > > + /* Considered invariant insns have only one set. */ > > + gcc_assert (set != NULL_RTX); > > + reg = SET_DEST (set); > > + if (GET_CODE (reg) == SUBREG) > > + reg = SUBREG_REG (reg); > > + if (MEM_P (reg)) > > + { > > + *nregs = 0; > > + pressure_class = NO_REGS; > > + } > Don't you need to look at the addresses within the MEM? > > > > > Index: gcc/config/arm/arm.c > > =================================================================== > > --- gcc/config/arm/arm.c (revision 191816) > > +++ gcc/config/arm/arm.c (working copy) > > @@ -2021,6 +2021,11 @@ arm_option_override (void) > > && current_tune->num_prefetch_slots > 0) > > flag_prefetch_loop_arrays = 1; > > > > + /* Enable register pressure hoist when optimizing for size on Thumb1 set. > */ > > + if (TARGET_THUMB1 && optimize_function_for_size_p (cfun) > > + && flag_ira_hoist_pressure == -1) > > + flag_ira_hoist_pressure = 1; > I'd rather see this enabled for all targets when optimizing for size; > that guarantees more testing. Even if it doesn't help a particular > target, as long as it isn't hurting we're better off enabling it. > > I don't see any testsuite additions -- it'd really help long term to > have some tests for this stuff. > > You should address the issues noted above and repost for final review > before likely integration. > > > > jeff > Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 192194) +++ gcc/doc/invoke.texi (working copy) @@ -372,7 +372,7 @@ Objective-C and Objective-C++ Dialects}. -finline-small-functions -fipa-cp -fipa-cp-clone @gol -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol -fira-algorithm=@var{algorithm} @gol --fira-region=@var{region} @gol +-fira-region=@var{region} -fira-hoist-pressure @gol -fira-loop-pressure -fno-ira-share-save-slots @gol -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol -fivopts -fkeep-inline-functions -fkeep-static-consts @gol @@ -6996,6 +6996,14 @@ This typically results in the smallest code size, @end table +@item -fira-hoist-pressure +@opindex fira-hoist-pressure +Use IRA to evaluate register pressure in the code hoisting pass for +decisions to hoist expressions. This option usually results in smaller +code, but it can slow the compiler down. + +This option is enabled at level @option{-Os} for all targets. + @item -fira-loop-pressure @opindex fira-loop-pressure Use IRA to evaluate register pressure in loops for decisions to move Index: gcc/testsuite/gcc.dg/hoist-register-pressure.c =================================================================== --- gcc/testsuite/gcc.dg/hoist-register-pressure.c (revision 0) +++ gcc/testsuite/gcc.dg/hoist-register-pressure.c (revision 0) @@ -0,0 +1,31 @@ +/* { dg-options "-Os -fdump-rtl-hoist" } */ +/* { dg-final { scan-rtl-dump "PRE/HOIST: end of bb .* copying expression" "hoist" } } */ + +#define BUF 100 +int a[BUF]; + +void com (int); +void bar (int); + +int foo (int x, int y, int z) +{ + /* x+y won't be hoisted without defaultly enabled "-fira-hoist-pressure", + because its rtx_cost is too small. */ + if (z) + { + a[1] = a[0] + a[2]; + a[2] = a[1] + a[3]; + a[3] = a[2] + a[4]; + a[4] = a[3] + a[5]; + a[5] = a[4] + a[6]; + a[6] = a[5] + a[7]; + a[7] = a[6] + a[8]; + com (x+y); + } + else + { + bar (x+y); + } + + return 0; +} Index: gcc/haifa-sched.c =================================================================== --- gcc/haifa-sched.c (revision 192194) +++ gcc/haifa-sched.c (working copy) @@ -6629,7 +6629,7 @@ sched_init (void) /* We need info about pseudos for rtl dumps about pseudo classes and costs. */ regstat_init_n_sets_and_refs (); - ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL); + ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL); sched_regno_pressure_class = (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class)); for (i = 0; i < max_regno; i++) Index: gcc/regmove.c =================================================================== --- gcc/regmove.c (revision 192194) +++ gcc/regmove.c (working copy) @@ -1,6 +1,7 @@ /* Move registers around to reduce number of move instructions needed. Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, - 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 + 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, + 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -1237,7 +1238,7 @@ regmove_optimize (void) regstat_compute_ri (); if (flag_ira_loop_pressure) - ira_set_pseudo_classes (dump_file); + ira_set_pseudo_classes (true, dump_file); regno_src_regno = XNEWVEC (int, nregs); for (i = nregs; --i >= 0; ) Index: gcc/gcse.c =================================================================== --- gcc/gcse.c (revision 192194) +++ gcc/gcse.c (working copy) @@ -1,6 +1,6 @@ /* Partial redundancy elimination / Hoisting for RTL. Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, - 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. + 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -20,9 +20,11 @@ along with GCC; see the file COPYING3. If not see /* TODO - reordering of memory allocation and freeing to be more space efficient - - do rough calc of how many regs are needed in each block, and a rough - calc of how many regs are available in each class and use that to - throttle back the code in cases where RTX_COST is minimal. + - simulate register pressure change of each basic block accurately during + hoist process. But I doubt the benefit since most expressions hoisted + are constant or address, which usually won't reduce register pressure. + - calc rough register pressure information and use the info to drive all + kinds of code motion(including code hoisting) in a unified way. */ /* References searched while implementing this. @@ -141,11 +143,12 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "toplev.h" +#include "hard-reg-set.h" #include "rtl.h" #include "tree.h" #include "tm_p.h" #include "regs.h" -#include "hard-reg-set.h" +#include "ira.h" #include "flags.h" #include "insn-config.h" #include "recog.h" @@ -412,6 +415,22 @@ static bool doing_code_hoisting_p = false; /* For available exprs */ static sbitmap *ae_kill; +/* Data stored for each basic block. */ +struct bb_data +{ + /* Maximal register pressure inside basic block for given register class + (defined only for the pressure classes). */ + int max_reg_pressure[N_REG_CLASSES]; +}; + +#define BB_DATA(bb) ((struct bb_data *) (bb)->aux) + +static basic_block curr_bb; + +/* Currently register pressure for each pressure class. */ +static int curr_reg_pressure[N_REG_CLASSES]; + + static void compute_can_copy (void); static void *gmalloc (size_t) ATTRIBUTE_MALLOC; static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC; @@ -460,9 +479,11 @@ static void alloc_code_hoist_mem (int, int); static void free_code_hoist_mem (void); static void compute_code_hoist_vbeinout (void); static void compute_code_hoist_data (void); -static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *, - int, int *); +static int should_hoist_expr_to_dom (basic_block, struct expr *, basic_block, + sbitmap, int, int *, enum reg_class, + int *, bitmap); static int hoist_code (void); +static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs); static int one_code_hoisting_pass (void); static rtx process_insert_insn (struct expr *); static int pre_edge_insert (struct edge_list *, struct expr **); @@ -1857,7 +1878,7 @@ prune_expressions (bool pre_p) a basic block we should account for any side-effects of a subsequent jump instructions that could clobber the expression. It would be best to implement this check along the lines of - hoist_expr_reaches_here_p where the target block is already known + should_hoist_expr_to_dom where the target block is already known and, hence, there's no need to conservatively prune expressions on "intermediate" set-and-jump instructions. */ FOR_EACH_EDGE (e, ei, bb->preds) @@ -2825,11 +2846,16 @@ compute_code_hoist_data (void) fprintf (dump_file, "\n"); } -/* Determine if the expression identified by EXPR_INDEX would - reach BB unimpared if it was placed at the end of EXPR_BB. - Stop the search if the expression would need to be moved more - than DISTANCE instructions. +/* Determine if the expression EXPR should be hoisted to EXPR_BB up in the + flow graph, given if it can reach BB unimpared. Stop the search if the + expression would need to be moved more than DISTANCE instructions. + PRESSURE_CLASS and NREGS are register class and number of hard registers + for storing EXPR. + + HOISTED_BBS points to a bitmap indicating basic blocks through which + EXPR is hoisted. + It's unclear exactly what Muchnick meant by "unimpared". It seems to me that the expression must either be computed or transparent in *every* block in the path(s) from EXPR_BB to BB. Any other definition @@ -2841,18 +2867,30 @@ compute_code_hoist_data (void) paths. */ static int -hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb, - char *visited, int distance, int *bb_size) +should_hoist_expr_to_dom (basic_block expr_bb, struct expr *expr, + basic_block bb, sbitmap visited, int distance, + int *bb_size, enum reg_class pressure_class, + int *nregs, bitmap hoisted_bbs) { + unsigned int i; edge pred; edge_iterator ei; + sbitmap_iterator sbi; int visited_allocated_locally = 0; /* Terminate the search if distance, for which EXPR is allowed to move, is exhausted. */ if (distance > 0) { - distance -= bb_size[bb->index]; + /* Let EXPR be hoisted through basic block at no cost if the block + has low register pressure. An exception is constant expression, + becasue hoisting constant expr aggresively results in worse code + as observed. */ + if (!flag_ira_hoist_pressure + || (BB_DATA (bb)->max_reg_pressure[pressure_class] + >= ira_class_hard_regs_num[pressure_class] + || CONST_INT_P (expr->expr))) + distance -= bb_size[bb->index]; if (distance <= 0) return 0; @@ -2863,7 +2901,8 @@ static int if (visited == NULL) { visited_allocated_locally = 1; - visited = XCNEWVEC (char, last_basic_block); + visited = sbitmap_alloc (last_basic_block); + sbitmap_zero (visited); } FOR_EACH_EDGE (pred, ei, bb->preds) @@ -2874,23 +2913,37 @@ static int break; else if (pred_bb == expr_bb) continue; - else if (visited[pred_bb->index]) + else if (TEST_BIT (visited, pred_bb->index)) continue; - - else if (! TEST_BIT (transp[pred_bb->index], expr_index)) + else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index)) break; - /* Not killed. */ else { - visited[pred_bb->index] = 1; - if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb, - visited, distance, bb_size)) + SET_BIT (visited, pred_bb->index); + if (! should_hoist_expr_to_dom (expr_bb, expr, pred_bb, + visited, distance, bb_size, + pressure_class, nregs, hoisted_bbs)) break; } } if (visited_allocated_locally) - free (visited); + { + /* If EXPR can be hoisted to expr_bb, record basic blocks through + which EXPR is hoisted in hoisted_bbs. Also update register + pressure for basic blocks newly added in hoisted_bbs. */ + if (flag_ira_hoist_pressure && !pred) + { + EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi) + if (!bitmap_bit_p (hoisted_bbs, i)) + { + bitmap_set_bit (hoisted_bbs, i); + BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class] + += *nregs; + } + } + sbitmap_free (visited); + } return (pred == NULL); } @@ -2907,8 +2960,44 @@ find_occr_in_bb (struct occr *occr, basic_block bb return occr; } -/* Actually perform code hoisting. */ +/* Actually perform code hoisting. + The code hoisting pass hoists multiple computations of expression + to dominator basic block, like from b2/b3 to b1 as depicted below: + + b1 ------ + /\ | + / \ | + bx by distance + / \ | + / \ | + b2 b3 ------ + + Unfortunately code hoisting generally extend live range of output + pseudo register, which increases register pressure and hurts register + allocation. To address this issue, an attribute MAX_DISTANCE is + computed and attached to each expression. The attribute is computed + from rtx cost of corresponding expression and it's used to control + how long the expression can be hoisted up in flow graph. As expression + is hoisted up in flow graph, GCC decreases its DISTANCE and stops the + hoist if DISTANCE reaches 0. + + Option "-fira-hoist-pressure" implements register pressure directed + hoist based on upper method. The rationale is: + 1. Calculate register pressure for each basic block by reusing IRA + facility. + 2. When expression is hoisted through one basic block, GCC checks + register pressure of the basic block and decrease DISTANCE only + when the register pressure is high. In other words, expression + will be hoisted through basic block with low register pressure + at no cost. + 3. Update register pressure information for basic blocks through + which expression is hoisted. + TODO: It is possible to have register pressure decreased because + of shrinked live range of input pseudo register when hoisting an + expression. For now, this effect is not simulated and we just + increase register pressure for hoisted expressions. */ + static int hoist_code (void) { @@ -2916,12 +3005,18 @@ hoist_code (void) VEC (basic_block, heap) *dom_tree_walk; unsigned int dom_tree_walk_index; VEC (basic_block, heap) *domby; - unsigned int i,j; + unsigned int i, j, k; struct expr **index_map; struct expr *expr; int *to_bb_head; int *bb_size; int changed = 0; + struct bb_data *data; + /* Basic blocks that have occurrences reachable from BB. */ + bitmap from_bbs; + /* Basic blocks through which expr is hoisted. */ + bitmap hoisted_bbs = NULL; + bitmap_iterator bi; /* Compute a mapping from expression number (`bitmap_index') to hash table entry. */ @@ -2959,6 +3054,10 @@ hoist_code (void) && (EDGE_SUCC (ENTRY_BLOCK_PTR, 0)->dest == ENTRY_BLOCK_PTR->next_bb)); + from_bbs = BITMAP_ALLOC (NULL); + if (flag_ira_hoist_pressure) + hoisted_bbs = BITMAP_ALLOC (NULL); + dom_tree_walk = get_all_dominated_blocks (CDI_DOMINATORS, ENTRY_BLOCK_PTR->next_bb); @@ -2977,12 +3076,12 @@ hoist_code (void) { if (TEST_BIT (hoist_vbeout[bb->index], i)) { + int nregs = 0; + enum reg_class pressure_class = NO_REGS; /* Current expression. */ struct expr *expr = index_map[i]; /* Number of occurrences of EXPR that can be hoisted to BB. */ int hoistable = 0; - /* Basic blocks that have occurrences reachable from BB. */ - bitmap_head _from_bbs, *from_bbs = &_from_bbs; /* Occurrences reachable from BB. */ VEC (occr_t, heap) *occrs_to_hoist = NULL; /* We want to insert the expression into BB only once, so @@ -2990,8 +3089,6 @@ hoist_code (void) int insn_inserted_p; occr_t occr; - bitmap_initialize (from_bbs, 0); - /* If an expression is computed in BB and is available at end of BB, hoist all occurrences dominated by BB to BB. */ if (TEST_BIT (comp[bb->index], i)) @@ -3045,13 +3142,18 @@ hoist_code (void) max_distance += (bb_size[dominated->index] - to_bb_head[INSN_UID (occr->insn)]); - /* Note if the expression would reach the dominated block - unimpared if it was placed at the end of BB. + pressure_class = get_pressure_class_and_nregs (occr->insn, + &nregs); + /* Note if the expression should be hoisted from the dominated + block to BB if it can reach DOMINATED unimpared. + Keep track of how many times this expression is hoistable from a dominated block into BB. */ - if (hoist_expr_reaches_here_p (bb, i, dominated, NULL, - max_distance, bb_size)) + if (should_hoist_expr_to_dom (bb, expr, dominated, NULL, + max_distance, bb_size, + pressure_class, &nregs, + hoisted_bbs)) { hoistable++; VEC_safe_push (occr_t, heap, @@ -3072,6 +3174,13 @@ hoist_code (void) to nullify any benefit we get from code hoisting. */ if (hoistable > 1 && dbg_cnt (hoist_insn)) { + /* Update register pressure for basic block to which expr + is hoisted. */ + if (flag_ira_hoist_pressure) + { + data = BB_DATA (bb); + data->max_reg_pressure[pressure_class] += nregs; + } /* If (hoistable != VEC_length), then there is an occurrence of EXPR in BB itself. Don't waste time looking for LCA in this case. */ @@ -3089,8 +3198,20 @@ hoist_code (void) } } else - /* Punt, no point hoisting a single occurence. */ - VEC_free (occr_t, heap, occrs_to_hoist); + { + /* Punt, no point hoisting a single occurence. */ + VEC_free (occr_t, heap, occrs_to_hoist); + /* Restore register pressure of basic block recorded in + hoisted_bbs when expr will not be hoisted. */ + if (flag_ira_hoist_pressure) + EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi) + { + data = BB_DATA (BASIC_BLOCK (k)); + data->max_reg_pressure[pressure_class] -= nregs; + } + } + if (flag_ira_hoist_pressure) + bitmap_clear (hoisted_bbs); insn_inserted_p = 0; @@ -3140,6 +3261,10 @@ hoist_code (void) } VEC_free (basic_block, heap, dom_tree_walk); + BITMAP_FREE (from_bbs); + if (flag_ira_hoist_pressure) + BITMAP_FREE (hoisted_bbs); + free (bb_size); free (to_bb_head); free (index_map); @@ -3147,6 +3272,165 @@ hoist_code (void) return changed; } +/* Return pressure class and number of needed hard registers (through + *NREGS) of register REGNO. */ +static enum reg_class +get_regno_pressure_class (int regno, int *nregs) +{ + if (regno >= FIRST_PSEUDO_REGISTER) + { + enum reg_class pressure_class; + + pressure_class = reg_allocno_class (regno); + pressure_class = ira_pressure_class_translate[pressure_class]; + *nregs + = ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)]; + return pressure_class; + } + else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno) + && ! TEST_HARD_REG_BIT (eliminable_regset, regno)) + { + *nregs = 1; + return ira_pressure_class_translate[REGNO_REG_CLASS (regno)]; + } + else + { + *nregs = 0; + return NO_REGS; + } +} + +/* Return pressure class and number of hard registers (through *NREGS) + for destination of INSN. */ +static enum reg_class +get_pressure_class_and_nregs (rtx insn, int *nregs) +{ + rtx reg; + enum reg_class pressure_class; + rtx set = single_set (insn); + + /* Considered invariant insns have only one set. */ + gcc_assert (set != NULL_RTX); + reg = SET_DEST (set); + if (GET_CODE (reg) == SUBREG) + reg = SUBREG_REG (reg); + if (MEM_P (reg)) + { + *nregs = 0; + pressure_class = NO_REGS; + } + else + { + gcc_assert (REG_P (reg)); + pressure_class = reg_allocno_class (REGNO (reg)); + pressure_class = ira_pressure_class_translate[pressure_class]; + *nregs + = ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))]; + } + return pressure_class; +} + +/* Increase (if INCR_P) or decrease current register pressure for + register REGNO. */ +static void +change_pressure (int regno, bool incr_p) +{ + int nregs; + enum reg_class pressure_class; + + pressure_class = get_regno_pressure_class (regno, &nregs); + if (! incr_p) + curr_reg_pressure[pressure_class] -= nregs; + else + { + curr_reg_pressure[pressure_class] += nregs; + if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class] + < curr_reg_pressure[pressure_class]) + BB_DATA (curr_bb)->max_reg_pressure[pressure_class] + = curr_reg_pressure[pressure_class]; + } +} + +/* Calculate register pressure for each basic block by walking insns + from last to first. */ +static void +calculate_bb_reg_pressure (void) +{ + int i; + unsigned int j; + rtx insn; + basic_block bb; + bitmap curr_regs_live; + bitmap_iterator bi; + + + ira_setup_eliminable_regset (); + curr_regs_live = BITMAP_ALLOC (®_obstack); + FOR_EACH_BB (bb) + { + curr_bb = bb; + bitmap_copy (curr_regs_live, DF_LR_OUT (bb)); + for (i = 0; i < ira_pressure_classes_num; i++) + curr_reg_pressure[ira_pressure_classes[i]] = 0; + EXECUTE_IF_SET_IN_BITMAP (curr_regs_live, 0, j, bi) + change_pressure (j, true); + + FOR_BB_INSNS_REVERSE (bb, insn) + { + rtx dreg; + int regno; + df_ref *def_rec, *use_rec; + + if (! NONDEBUG_INSN_P (insn)) + continue; + + for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++) + { + dreg = DF_REF_REAL_REG (*def_rec); + gcc_assert (REG_P (dreg)); + regno = REGNO (dreg); + if (!(DF_REF_FLAGS (*def_rec) + & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) + { + if (bitmap_clear_bit (curr_regs_live, regno)) + change_pressure (regno, false); + } + } + + for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++) + { + dreg = DF_REF_REAL_REG (*use_rec); + gcc_assert (REG_P (dreg)); + regno = REGNO (dreg); + if (bitmap_set_bit (curr_regs_live, regno)) + change_pressure (regno, true); + } + } + } + BITMAP_FREE (curr_regs_live); + + if (dump_file == NULL) + return; + + fprintf (dump_file, "\nRegister Pressure: \n"); + FOR_EACH_BB (bb) + { + fprintf (dump_file, " Basic block %d: \n", bb->index); + for (i = 0; (int) i < ira_pressure_classes_num; i++) + { + enum reg_class pressure_class; + + pressure_class = ira_pressure_classes[i]; + if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0) + continue; + + fprintf (dump_file, " %s=%d\n", reg_class_names[pressure_class], + BB_DATA (bb)->max_reg_pressure[pressure_class]); + } + } + fprintf (dump_file, "\n"); +} + /* Top level routine to perform one code hoisting (aka unification) pass Return nonzero if a change was made. */ @@ -3166,6 +3450,16 @@ one_code_hoisting_pass (void) doing_code_hoisting_p = true; + /* Calculate register pressure for each basic block. */ + if (flag_ira_hoist_pressure) + { + regstat_init_n_sets_and_refs (); + ira_set_pseudo_classes (false, dump_file); + alloc_aux_for_blocks (sizeof (struct bb_data)); + calculate_bb_reg_pressure (); + regstat_free_n_sets_and_refs (); + } + /* We need alias. */ init_alias_analysis (); @@ -3186,6 +3480,11 @@ one_code_hoisting_pass (void) free_code_hoist_mem (); } + if (flag_ira_hoist_pressure) + { + free_aux_for_blocks (); + free_reg_info (); + } free_hash_table (&expr_hash_table); free_gcse_mem (); obstack_free (&gcse_obstack, NULL); Index: gcc/loop-invariant.c =================================================================== --- gcc/loop-invariant.c (revision 192194) +++ gcc/loop-invariant.c (working copy) @@ -1,5 +1,5 @@ /* RTL-level loop invariant motion. - Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010 + Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -1915,7 +1915,7 @@ move_loop_invariants (void) { df_analyze (); regstat_init_n_sets_and_refs (); - ira_set_pseudo_classes (dump_file); + ira_set_pseudo_classes (true, dump_file); calculate_loop_reg_pressure (); regstat_free_n_sets_and_refs (); } Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 192194) +++ gcc/common.opt (working copy) @@ -1392,6 +1392,11 @@ Enum(ira_region) String(all) Value(IRA_REGION_ALL) EnumValue Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED) +fira-hoist-pressure +Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization +Use IRA based register pressure calculation +in RTL hoist optimizations. + fira-loop-pressure Common Report Var(flag_ira_loop_pressure) Use IRA based register pressure calculation Index: gcc/ira.c =================================================================== --- gcc/ira.c (revision 192194) +++ gcc/ira.c (working copy) @@ -4183,7 +4183,7 @@ ira (FILE *f) crtl->is_leaf = leaf_function_p (); if (resize_reg_info () && flag_ira_loop_pressure) - ira_set_pseudo_classes (ira_dump_file); + ira_set_pseudo_classes (true, ira_dump_file); rebuild_p = update_equiv_regs (); Index: gcc/ira.h =================================================================== --- gcc/ira.h (revision 192194) +++ gcc/ira.h (working copy) @@ -1,6 +1,6 @@ /* Communication between the Integrated Register Allocator (IRA) and the rest of the compiler. - Copyright (C) 2006, 2007, 2008, 2009, 2010 + Copyright (C) 2006, 2007, 2008, 2009, 2010, 2012 Free Software Foundation, Inc. Contributed by Vladimir Makarov . @@ -131,7 +131,7 @@ extern void ira_init (void); extern void ira_finish_once (void); extern void ira_setup_eliminable_regset (void); extern rtx ira_eliminate_regs (rtx, enum machine_mode); -extern void ira_set_pseudo_classes (FILE *); +extern void ira_set_pseudo_classes (bool, FILE *); extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *); extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *); Index: gcc/ira-costs.c =================================================================== --- gcc/ira-costs.c (revision 192194) +++ gcc/ira-costs.c (working copy) @@ -2048,9 +2048,10 @@ ira_costs (void) ira_free (total_allocno_costs); } -/* Entry function which defines classes for pseudos. */ +/* Entry function which defines classes for pseudos. + Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true. */ void -ira_set_pseudo_classes (FILE *dump_file) +ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file) { allocno_p = false; internal_flag_ira_verbose = flag_ira_verbose; @@ -2059,7 +2060,9 @@ void initiate_regno_cost_classes (); find_costs_and_classes (dump_file); finish_regno_cost_classes (); - pseudo_classes_defined_p = true; + if (define_pseudo_classes) + pseudo_classes_defined_p = true; + finish_costs (); }