diff mbox

[RFA] Implement register pressure directed hoist pass

Message ID 000701cda77c$a8b88260$fa298720$@cheng@arm.com
State New
Headers show

Commit Message

Bin Cheng Oct. 11, 2012, 6:50 a.m. UTC
Hi Jeff, Steven,

This is the updated patch according to your comments. The main changes
includes:
1. Enable the option for all target at Os level by default.
2. Add a target independent test case.
3. Add comments on how the algorithm works.
4. Simplify the calculation of register pressure info by using DF caches.

Jeff, as for accurately simulating the change of register pressure made by
hoisting expression, I explained a little in previous message. According to
observation, cases in which register pressure decreases are relatively rare,
so this patch does not implement it. I can continue to improve code hoisting
that way in the future.

I bootstrapped it on x86_64 and x86, also tested on x86_64/x86/arm.
I recollected size info for CSiBE on miscellaneous targets as following:
		improvement
ARM		0.12%
Thumb1		0.13%
mips		0.24%
powerpc	0.59%
Thumb2		X
x86		X
x86_64		X
(X means no obvious effect, unfortunately.)

Also, since this is the simplification of previous patch, I think there is
no slowdown in compilation.
 
Please review. Thanks.

And updated ChangeLog entry:

2012-10-11  Bin Cheng  <bin.cheng@arm.com>

	* common.opt (flag_ira_hoist_pressure): New.
	* doc/invoke.texi (-fira-hoist-pressure): Describe.
	* ira-costs.c (ira_set_pseudo_classes): New parameter.
	* ira.h: Update copyright dates.
	(ira_set_pseudo_classes): Update prototype.
	* haifa-sched.c (sched_init): Update call.
	* ira.c (ira): Update call.
	* regmove.c: Update copyright dates.
	(regmove_optimize): Update call.
	* loop-invariant.c: Update copyright dates.
	(move_loop_invariants): Update call.
	* gcse.c: Update copyright dates.
	(struct bb_data): New structure.
	(BB_DATA): New macro.
	(curr_bb, curr_reg_pressure): New static variables.
	(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
	Change parameter expr_index to expr.
	Change parameter visited.
	New parameters pressure_class, nregs and hoisted_bbs.
	Use reg pressure to determine the distance expr can be hoisted.
	(hoist_code): Use reg pressure to direct the hoist process.
	(get_regno_pressure_class, get_pressure_class_and_nregs)
	(change_pressure, calculate_bb_reg_pressure): New.
	(one_code_hoisting_pass): Calculate register pressure. Allocate
	and free data.

gcc/testsuite/ChangeLog
2012-10-11  Bin Cheng  <bin.cheng@arm.com>

	* testsuite/gcc.dg/hoist-register-pressure.c: New test.


> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Tuesday, October 02, 2012 8:58 PM
> To: Bin Cheng
> Cc: 'Steven Bosscher'; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On 09/29/2012 12:37 AM, Bin Cheng wrote:
> > Hi Steven,
> >
> > This is the updated patch according to your comments. Please review.
> > I also re-collected code size data and found it is improved by about
> > 0.24% for mips, which is better than previous data. I believe this
> > should be caused by recent changes in trunk, rather than by using DF
> > caches to calculate register pressure.
> >
> > Thanks.
> >
> > 2012-09-29  Bin Cheng<bin.cheng@arm.com>
> >
> > 	* common.opt (flag_ira_hoist_pressure): New.
> > 	* doc/invoke.texi (-fira-hoist-pressure): Describe.
> > 	* ira-costs.c (ira_set_pseudo_classes): New parameter.
> > 	* ira.h (ira_set_pseudo_classes): Update prototype.
> > 	* haifa-sched.c (sched_init): Update call.
> > 	* ira.c (ira): Update call.
> > 	* regmove.c (regmove_optimize): Update call.
> > 	* loop-invariant.c (move_loop_invariants): Update call.
> > 	* gcse.c (struct bb_data): New structure.
> > 	(BB_DATA): New macro.
> > 	(curr_bb, curr_regs_live, curr_reg_pressure, regs_set)
> > 	(n_regs_set): New static variables.
> > 	(hoist_expr_reaches_here_p): Use reg pressure to determine the
> > 	distance expr can be hoisted.
> > 	(hoist_code): Use reg pressure to direct the hoist process.
> > 	(get_regno_pressure_class, get_pressure_class_and_nregs)
> > 	(change_pressure, mark_regno_live, mark_regno_death)
> > 	(mark_reg_death, mark_reg_store, calculate_bb_reg_pressure): New.
> > 	(one_code_hoisting_pass): Calculate register pressure. Free data.
> > 	* config/arm/arm.c (arm_option_override): Set
flag_ira_hoist_pressure
> > 	on Thumb1 when optimizing for size.
> >
> >
> > hoist-reg-pressure-20120929.txt
> > +@item -fira-hoist-pressure
> > +@opindex fira-hoist-pressure
> > +Use IRA to evaluate register pressure in hoist pass for decisions to
hoist
> > +expressions.  This option usually results in generation of smaller code
on
> > +RISC machines, but it can slow the compiler down.
> I wouldn't use CISC/RISC here; I'd just say it usually results in
> smaller code.
> 
> You need to update the copyright year in gcse.c, ira.h, regmove.c, and
> loop-invariant.c.
> 
> > +      /* Only decrease distance if bb has high register pressure or
EXPR
> > +	 is const expr, otherwise EXPR can be hoisted through bb without
> > +	 cost.  */
> ?!?  This comment makes no sense to me.  To accurately know how hoisting
> an expression affects pressure you have to look at the inputs and output
> and see how their lifetime has changed.
> 
> In general:
> 
> For inputs, hoisting *may* reduce pressure.  You really have to look at
> how the life of the input changes based on the new location of the insn.
>   For example, if the input's lifetime is unchanged (say perhaps because
> it was live after the insn we want to hoist), then hoisting will have no
> impact. Otherwise the input's life is shortened, but to know by how much
> you have to determine whether the new death of the input occurs (it may
> still die in the hoisted insn or it may die elsewhere).
> 
> For an output, hoisting will (effectively) always extend the lifetime.
> 
> I've speculated that the right way to deal with register pressure in
> code motion is to actually build the dependency graph and use that to
> guide the code motions.  I've never cobbled together any real code to do
> this though.
> 
> Can we find a better name for hoist_expr_reaches_here_p since it's no
> longer just dealing with reachability -- it has heuristics now for
> profitability as well.
> 
> > @@ -2863,7 +2909,8 @@ static int
> >     if (visited == NULL)
> >       {
> >         visited_allocated_locally = 1;
> > -      visited = XCNEWVEC (char, last_basic_block);
> > +      visited = sbitmap_alloc (last_basic_block);
> > +      sbitmap_zero (visited);
> >       }
> What's the purpose behind changing visited from a simple array to a
> sbitmap?  I'm not objecting, but would like to hear the rationale behind
> that change.  I'll also note it wasn't mentioned in the ChangeLog.
> 
> Similarly what's the rationale behind passing the expression itself
> rather than just its index?  I don't see where we need to use anything
> other than the index in this code.  And again, this change isn't
> mentioned in the ChangeLog.
> 
> > +  /* Considered invariant insns have only one set.  */
> > +  gcc_assert (set != NULL_RTX);
> > +  reg = SET_DEST (set);
> > +  if (GET_CODE (reg) == SUBREG)
> > +    reg = SUBREG_REG (reg);
> > +  if (MEM_P (reg))
> > +    {
> > +      *nregs = 0;
> > +      pressure_class = NO_REGS;
> > +    }
> Don't you need to look at the addresses within the MEM?
> 
> 
> 
> > Index: gcc/config/arm/arm.c
> > ===================================================================
> > --- gcc/config/arm/arm.c	(revision 191816)
> > +++ gcc/config/arm/arm.c	(working copy)
> > @@ -2021,6 +2021,11 @@ arm_option_override (void)
> >         && current_tune->num_prefetch_slots > 0)
> >       flag_prefetch_loop_arrays = 1;
> >
> > +  /* Enable register pressure hoist when optimizing for size on Thumb1
set.
> */
> > +  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
> > +      && flag_ira_hoist_pressure == -1)
> > +    flag_ira_hoist_pressure = 1;
> I'd rather see this enabled for all targets when optimizing for size;
> that guarantees more testing.  Even if it doesn't help a particular
> target, as long as it isn't hurting we're better off enabling it.
> 
> I don't see any testsuite additions -- it'd really help long term to
> have some tests for this stuff.
> 
> You should address the issues noted above and repost for final review
> before likely integration.
> 
> 
> 
> jeff
>
diff mbox

Patch

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 192194)
+++ gcc/doc/invoke.texi	(working copy)
@@ -372,7 +372,7 @@  Objective-C and Objective-C++ Dialects}.
 -finline-small-functions -fipa-cp -fipa-cp-clone @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
 -fira-algorithm=@var{algorithm} @gol
--fira-region=@var{region} @gol
+-fira-region=@var{region} -fira-hoist-pressure @gol
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
@@ -6996,6 +6996,14 @@  This typically results in the smallest code size,
 
 @end table
 
+@item -fira-hoist-pressure
+@opindex fira-hoist-pressure
+Use IRA to evaluate register pressure in the code hoisting pass for
+decisions to hoist expressions.  This option usually results in smaller
+code, but it can slow the compiler down.
+
+This option is enabled at level @option{-Os} for all targets.
+
 @item -fira-loop-pressure
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decisions to move
Index: gcc/testsuite/gcc.dg/hoist-register-pressure.c
===================================================================
--- gcc/testsuite/gcc.dg/hoist-register-pressure.c	(revision 0)
+++ gcc/testsuite/gcc.dg/hoist-register-pressure.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* { dg-options "-Os -fdump-rtl-hoist" }  */
+/* { dg-final { scan-rtl-dump "PRE/HOIST: end of bb .* copying expression" "hoist" } } */
+
+#define BUF 100
+int a[BUF];
+
+void com (int);
+void bar (int);
+
+int foo (int x, int y, int z)
+{
+  /* x+y won't be hoisted without defaultly enabled "-fira-hoist-pressure",
+     because its rtx_cost is too small.  */
+  if (z)
+    {
+      a[1] = a[0] + a[2];
+      a[2] = a[1] + a[3];
+      a[3] = a[2] + a[4];
+      a[4] = a[3] + a[5];
+      a[5] = a[4] + a[6];
+      a[6] = a[5] + a[7];
+      a[7] = a[6] + a[8];
+      com (x+y);
+    }
+  else
+    {
+      bar (x+y);
+    }
+
+  return 0;
+}
Index: gcc/haifa-sched.c
===================================================================
--- gcc/haifa-sched.c	(revision 192194)
+++ gcc/haifa-sched.c	(working copy)
@@ -6629,7 +6629,7 @@  sched_init (void)
 	/* We need info about pseudos for rtl dumps about pseudo
 	   classes and costs.  */
 	regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
+      ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
       sched_regno_pressure_class
 	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
       for (i = 0; i < max_regno; i++)
Index: gcc/regmove.c
===================================================================
--- gcc/regmove.c	(revision 192194)
+++ gcc/regmove.c	(working copy)
@@ -1,6 +1,7 @@ 
 /* Move registers around to reduce number of move instructions needed.
    Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
+   2012
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -1237,7 +1238,7 @@  regmove_optimize (void)
   regstat_compute_ri ();
 
   if (flag_ira_loop_pressure)
-    ira_set_pseudo_classes (dump_file);
+    ira_set_pseudo_classes (true, dump_file);
 
   regno_src_regno = XNEWVEC (int, nregs);
   for (i = nregs; --i >= 0; )
Index: gcc/gcse.c
===================================================================
--- gcc/gcse.c	(revision 192194)
+++ gcc/gcse.c	(working copy)
@@ -1,6 +1,6 @@ 
 /* Partial redundancy elimination / Hoisting for RTL.
    Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
-   2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -20,9 +20,11 @@  along with GCC; see the file COPYING3.  If not see
 
 /* TODO
    - reordering of memory allocation and freeing to be more space efficient
-   - do rough calc of how many regs are needed in each block, and a rough
-     calc of how many regs are available in each class and use that to
-     throttle back the code in cases where RTX_COST is minimal.
+   - simulate register pressure change of each basic block accurately during
+     hoist process.  But I doubt the benefit since most expressions hoisted
+     are constant or address, which usually won't reduce register pressure.
+   - calc rough register pressure information and use the info to drive all
+     kinds of code motion(including code hoisting) in a unified way.
 */
 
 /* References searched while implementing this.
@@ -141,11 +143,12 @@  along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "toplev.h"
 
+#include "hard-reg-set.h"
 #include "rtl.h"
 #include "tree.h"
 #include "tm_p.h"
 #include "regs.h"
-#include "hard-reg-set.h"
+#include "ira.h"
 #include "flags.h"
 #include "insn-config.h"
 #include "recog.h"
@@ -412,6 +415,22 @@  static bool doing_code_hoisting_p = false;
 /* For available exprs */
 static sbitmap *ae_kill;
 
+/* Data stored for each basic block.  */
+struct bb_data
+{
+  /* Maximal register pressure inside basic block for given register class
+     (defined only for the pressure classes).  */
+  int max_reg_pressure[N_REG_CLASSES];
+};
+
+#define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
+
+static basic_block curr_bb;
+
+/* Currently register pressure for each pressure class.  */
+static int curr_reg_pressure[N_REG_CLASSES];
+
+
 static void compute_can_copy (void);
 static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
 static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
@@ -460,9 +479,11 @@  static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
-				      int, int *);
+static int should_hoist_expr_to_dom (basic_block, struct expr *, basic_block,
+				     sbitmap, int, int *, enum reg_class,
+				     int *, bitmap);
 static int hoist_code (void);
+static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs);
 static int one_code_hoisting_pass (void);
 static rtx process_insert_insn (struct expr *);
 static int pre_edge_insert (struct edge_list *, struct expr **);
@@ -1857,7 +1878,7 @@  prune_expressions (bool pre_p)
 	 a basic block we should account for any side-effects of a subsequent
 	 jump instructions that could clobber the expression.  It would
 	 be best to implement this check along the lines of
-	 hoist_expr_reaches_here_p where the target block is already known
+	 should_hoist_expr_to_dom where the target block is already known
 	 and, hence, there's no need to conservatively prune expressions on
 	 "intermediate" set-and-jump instructions.  */
       FOR_EACH_EDGE (e, ei, bb->preds)
@@ -2825,11 +2846,16 @@  compute_code_hoist_data (void)
     fprintf (dump_file, "\n");
 }
 
-/* Determine if the expression identified by EXPR_INDEX would
-   reach BB unimpared if it was placed at the end of EXPR_BB.
-   Stop the search if the expression would need to be moved more
-   than DISTANCE instructions.
+/* Determine if the expression EXPR should be hoisted to EXPR_BB up in the
+   flow graph, given if it can reach BB unimpared.  Stop the search if the
+   expression would need to be moved more than DISTANCE instructions.
 
+   PRESSURE_CLASS and NREGS are register class and number of hard registers
+   for storing EXPR.
+
+   HOISTED_BBS points to a bitmap indicating basic blocks through which
+   EXPR is hoisted.
+
    It's unclear exactly what Muchnick meant by "unimpared".  It seems
    to me that the expression must either be computed or transparent in
    *every* block in the path(s) from EXPR_BB to BB.  Any other definition
@@ -2841,18 +2867,30 @@  compute_code_hoist_data (void)
    paths.  */
 
 static int
-hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
-			   char *visited, int distance, int *bb_size)
+should_hoist_expr_to_dom (basic_block expr_bb, struct expr *expr,
+			  basic_block bb, sbitmap visited, int distance,
+			  int *bb_size, enum reg_class pressure_class,
+			  int *nregs, bitmap hoisted_bbs)
 {
+  unsigned int i;
   edge pred;
   edge_iterator ei;
+  sbitmap_iterator sbi;
   int visited_allocated_locally = 0;
 
   /* Terminate the search if distance, for which EXPR is allowed to move,
      is exhausted.  */
   if (distance > 0)
     {
-      distance -= bb_size[bb->index];
+      /* Let EXPR be hoisted through basic block at no cost if the block
+	 has low register pressure.  An exception is constant expression,
+	 becasue hoisting constant expr aggresively results in worse code
+	 as observed.  */
+      if (!flag_ira_hoist_pressure
+	  || (BB_DATA (bb)->max_reg_pressure[pressure_class]
+		>= ira_class_hard_regs_num[pressure_class]
+	      || CONST_INT_P (expr->expr)))
+	distance -= bb_size[bb->index];
 
       if (distance <= 0)
 	return 0;
@@ -2863,7 +2901,8 @@  static int
   if (visited == NULL)
     {
       visited_allocated_locally = 1;
-      visited = XCNEWVEC (char, last_basic_block);
+      visited = sbitmap_alloc (last_basic_block);
+      sbitmap_zero (visited);
     }
 
   FOR_EACH_EDGE (pred, ei, bb->preds)
@@ -2874,23 +2913,37 @@  static int
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      else if (visited[pred_bb->index])
+      else if (TEST_BIT (visited, pred_bb->index))
 	continue;
-
-      else if (! TEST_BIT (transp[pred_bb->index], expr_index))
+      else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index))
 	break;
-
       /* Not killed.  */
       else
 	{
-	  visited[pred_bb->index] = 1;
-	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
-					   visited, distance, bb_size))
+	  SET_BIT (visited, pred_bb->index);
+	  if (! should_hoist_expr_to_dom (expr_bb, expr, pred_bb,
+					  visited, distance, bb_size,
+					  pressure_class, nregs, hoisted_bbs))
 	    break;
 	}
     }
   if (visited_allocated_locally)
-    free (visited);
+    {
+      /* If EXPR can be hoisted to expr_bb, record basic blocks through
+	 which EXPR is hoisted in hoisted_bbs.  Also update register
+	 pressure for basic blocks newly added in hoisted_bbs.  */
+      if (flag_ira_hoist_pressure && !pred)
+	{
+	  EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi)
+	    if (!bitmap_bit_p (hoisted_bbs, i))
+	      {
+		bitmap_set_bit (hoisted_bbs, i);
+		BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class]
+		    += *nregs;
+	      }
+	}
+      sbitmap_free (visited);
+    }
 
   return (pred == NULL);
 }
@@ -2907,8 +2960,44 @@  find_occr_in_bb (struct occr *occr, basic_block bb
   return occr;
 }
 
-/* Actually perform code hoisting.  */
+/* Actually perform code hoisting.
 
+   The code hoisting pass hoists multiple computations of expression
+   to dominator basic block, like from b2/b3 to b1 as depicted below:
+
+          b1      ------
+          /\         |
+         /  \        |
+        bx   by   distance
+       /      \      |
+      /        \     |
+     b2        b3 ------
+
+   Unfortunately code hoisting generally extend live range of output
+   pseudo register, which increases register pressure and hurts register
+   allocation.  To address this issue, an attribute MAX_DISTANCE is
+   computed and attached to each expression.  The attribute is computed
+   from rtx cost of corresponding expression and it's used to control
+   how long the expression can be hoisted up in flow graph.  As expression
+   is hoisted up in flow graph, GCC decreases its DISTANCE and stops the
+   hoist if DISTANCE reaches 0.
+
+   Option "-fira-hoist-pressure" implements register pressure directed
+   hoist based on upper method.  The rationale is:
+     1. Calculate register pressure for each basic block by reusing IRA
+	facility.
+     2. When expression is hoisted through one basic block, GCC checks
+	register pressure of the basic block and decrease DISTANCE only
+	when the register pressure is high.  In other words, expression
+	will be hoisted through basic block with low register pressure
+	at no cost.
+     3. Update register pressure information for basic blocks through
+ 	which expression is hoisted.
+	TODO: It is possible to have register pressure decreased because
+	of shrinked live range of input pseudo register when hoisting an
+	expression.  For now, this effect is not simulated and we just
+	increase register pressure for hoisted expressions.  */
+
 static int
 hoist_code (void)
 {
@@ -2916,12 +3005,18 @@  hoist_code (void)
   VEC (basic_block, heap) *dom_tree_walk;
   unsigned int dom_tree_walk_index;
   VEC (basic_block, heap) *domby;
-  unsigned int i,j;
+  unsigned int i, j, k;
   struct expr **index_map;
   struct expr *expr;
   int *to_bb_head;
   int *bb_size;
   int changed = 0;
+  struct bb_data *data;
+  /* Basic blocks that have occurrences reachable from BB.  */
+  bitmap from_bbs;
+  /* Basic blocks through which expr is hoisted.  */
+  bitmap hoisted_bbs = NULL;
+  bitmap_iterator bi;
 
   /* Compute a mapping from expression number (`bitmap_index') to
      hash table entry.  */
@@ -2959,6 +3054,10 @@  hoist_code (void)
 	      && (EDGE_SUCC (ENTRY_BLOCK_PTR, 0)->dest
 		  == ENTRY_BLOCK_PTR->next_bb));
 
+  from_bbs = BITMAP_ALLOC (NULL);
+  if (flag_ira_hoist_pressure)
+    hoisted_bbs = BITMAP_ALLOC (NULL);
+
   dom_tree_walk = get_all_dominated_blocks (CDI_DOMINATORS,
 					    ENTRY_BLOCK_PTR->next_bb);
 
@@ -2977,12 +3076,12 @@  hoist_code (void)
 	{
 	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
+	      int nregs = 0;
+	      enum reg_class pressure_class = NO_REGS;
 	      /* Current expression.  */
 	      struct expr *expr = index_map[i];
 	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
 	      int hoistable = 0;
-	      /* Basic blocks that have occurrences reachable from BB.  */
-	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
 	      /* Occurrences reachable from BB.  */
 	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
 	      /* We want to insert the expression into BB only once, so
@@ -2990,8 +3089,6 @@  hoist_code (void)
 	      int insn_inserted_p;
 	      occr_t occr;
 
-	      bitmap_initialize (from_bbs, 0);
-
 	      /* If an expression is computed in BB and is available at end of
 		 BB, hoist all occurrences dominated by BB to BB.  */
 	      if (TEST_BIT (comp[bb->index], i))
@@ -3045,13 +3142,18 @@  hoist_code (void)
 		    max_distance += (bb_size[dominated->index]
 				     - to_bb_head[INSN_UID (occr->insn)]);
 
-		  /* Note if the expression would reach the dominated block
-		     unimpared if it was placed at the end of BB.
+		  pressure_class = get_pressure_class_and_nregs (occr->insn,
+								 &nregs);
 
+		  /* Note if the expression should be hoisted from the dominated
+		     block to BB if it can reach DOMINATED unimpared.
+
 		     Keep track of how many times this expression is hoistable
 		     from a dominated block into BB.  */
-		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
-						 max_distance, bb_size))
+		  if (should_hoist_expr_to_dom (bb, expr, dominated, NULL,
+						max_distance, bb_size,
+						pressure_class,	&nregs,
+						hoisted_bbs))
 		    {
 		      hoistable++;
 		      VEC_safe_push (occr_t, heap,
@@ -3072,6 +3174,13 @@  hoist_code (void)
 		 to nullify any benefit we get from code hoisting.  */
 	      if (hoistable > 1 && dbg_cnt (hoist_insn))
 		{
+		  /* Update register pressure for basic block to which expr
+		     is hoisted.  */
+		  if (flag_ira_hoist_pressure)
+		    {
+		      data = BB_DATA (bb);
+		      data->max_reg_pressure[pressure_class] += nregs;
+		    }
 		  /* If (hoistable != VEC_length), then there is
 		     an occurrence of EXPR in BB itself.  Don't waste
 		     time looking for LCA in this case.  */
@@ -3089,8 +3198,20 @@  hoist_code (void)
 		    }
 		}
 	      else
-		/* Punt, no point hoisting a single occurence.  */
-		VEC_free (occr_t, heap, occrs_to_hoist);
+		{
+		  /* Punt, no point hoisting a single occurence.  */
+		  VEC_free (occr_t, heap, occrs_to_hoist);
+		  /* Restore register pressure of basic block recorded in
+ 		     hoisted_bbs when expr will not be hoisted.  */
+		  if (flag_ira_hoist_pressure)
+		    EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi)
+		      {
+			data = BB_DATA (BASIC_BLOCK (k));
+			data->max_reg_pressure[pressure_class] -= nregs;
+		      }
+		}
+	      if (flag_ira_hoist_pressure)
+		bitmap_clear (hoisted_bbs);
 
 	      insn_inserted_p = 0;
 
@@ -3140,6 +3261,10 @@  hoist_code (void)
     }
 
   VEC_free (basic_block, heap, dom_tree_walk);
+  BITMAP_FREE (from_bbs);
+  if (flag_ira_hoist_pressure)
+    BITMAP_FREE (hoisted_bbs);
+
   free (bb_size);
   free (to_bb_head);
   free (index_map);
@@ -3147,6 +3272,165 @@  hoist_code (void)
   return changed;
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+static enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = reg_allocno_class (regno);
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+      return pressure_class;
+    }
+  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
+    {
+      *nregs = 1;
+      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+    }
+  else
+    {
+      *nregs = 0;
+      return NO_REGS;
+    }
+}
+
+/* Return pressure class and number of hard registers (through *NREGS)
+   for destination of INSN. */
+static enum reg_class
+get_pressure_class_and_nregs (rtx insn, int *nregs)
+{
+  rtx reg;
+  enum reg_class pressure_class;
+  rtx set = single_set (insn);
+
+  /* Considered invariant insns have only one set.  */
+  gcc_assert (set != NULL_RTX);
+  reg = SET_DEST (set);
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (MEM_P (reg))
+    {
+      *nregs = 0;
+      pressure_class = NO_REGS;
+    }
+  else
+    {
+      gcc_assert (REG_P (reg));
+      pressure_class = reg_allocno_class (REGNO (reg));
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))];
+    }
+  return pressure_class;
+}
+
+/* Increase (if INCR_P) or decrease current register pressure for
+   register REGNO.  */
+static void
+change_pressure (int regno, bool incr_p)
+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  pressure_class = get_regno_pressure_class (regno, &nregs);
+  if (! incr_p)
+    curr_reg_pressure[pressure_class] -= nregs;
+  else
+    {
+      curr_reg_pressure[pressure_class] += nregs;
+      if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  < curr_reg_pressure[pressure_class])
+	BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  = curr_reg_pressure[pressure_class];
+    }
+}
+
+/* Calculate register pressure for each basic block by walking insns
+   from last to first.  */
+static void
+calculate_bb_reg_pressure (void)
+{
+  int i;
+  unsigned int j;
+  rtx insn;
+  basic_block bb;
+  bitmap curr_regs_live;
+  bitmap_iterator bi;
+
+
+  ira_setup_eliminable_regset ();
+  curr_regs_live = BITMAP_ALLOC (&reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      curr_bb = bb;
+      bitmap_copy (curr_regs_live, DF_LR_OUT (bb));
+      for (i = 0; i < ira_pressure_classes_num; i++)
+	curr_reg_pressure[ira_pressure_classes[i]] = 0;
+      EXECUTE_IF_SET_IN_BITMAP (curr_regs_live, 0, j, bi)
+	change_pressure (j, true);
+
+      FOR_BB_INSNS_REVERSE (bb, insn)
+	{
+	  rtx dreg;
+	  int regno;
+	  df_ref *def_rec, *use_rec;
+
+	  if (! NONDEBUG_INSN_P (insn))
+	    continue;
+
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    {
+	      dreg = DF_REF_REAL_REG (*def_rec);
+	      gcc_assert (REG_P (dreg));
+	      regno = REGNO (dreg);
+	      if (!(DF_REF_FLAGS (*def_rec) 
+		    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+		{
+		  if (bitmap_clear_bit (curr_regs_live, regno))
+		    change_pressure (regno, false);
+		}
+	    }
+
+	  for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+	    {
+	      dreg = DF_REF_REAL_REG (*use_rec);
+	      gcc_assert (REG_P (dreg));
+	      regno = REGNO (dreg);
+	      if (bitmap_set_bit (curr_regs_live, regno))
+		change_pressure (regno, true);
+	    }
+	}
+    }
+  BITMAP_FREE (curr_regs_live);
+
+  if (dump_file == NULL)
+    return;
+
+  fprintf (dump_file, "\nRegister Pressure: \n");
+  FOR_EACH_BB (bb)
+    {
+      fprintf (dump_file, "  Basic block %d: \n", bb->index);
+      for (i = 0; (int) i < ira_pressure_classes_num; i++)
+	{
+	  enum reg_class pressure_class;
+
+	  pressure_class = ira_pressure_classes[i];
+	  if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
+	    continue;
+
+	  fprintf (dump_file, "    %s=%d\n", reg_class_names[pressure_class],
+		   BB_DATA (bb)->max_reg_pressure[pressure_class]);
+	}
+    }
+  fprintf (dump_file, "\n");
+}
+
 /* Top level routine to perform one code hoisting (aka unification) pass
 
    Return nonzero if a change was made.  */
@@ -3166,6 +3450,16 @@  one_code_hoisting_pass (void)
 
   doing_code_hoisting_p = true;
 
+  /* Calculate register pressure for each basic block.  */
+  if (flag_ira_hoist_pressure)
+    {
+      regstat_init_n_sets_and_refs ();
+      ira_set_pseudo_classes (false, dump_file);
+      alloc_aux_for_blocks (sizeof (struct bb_data));
+      calculate_bb_reg_pressure ();
+      regstat_free_n_sets_and_refs ();
+    }
+
   /* We need alias.  */
   init_alias_analysis ();
 
@@ -3186,6 +3480,11 @@  one_code_hoisting_pass (void)
       free_code_hoist_mem ();
     }
 
+  if (flag_ira_hoist_pressure)
+    {
+      free_aux_for_blocks ();
+      free_reg_info ();
+    }
   free_hash_table (&expr_hash_table);
   free_gcse_mem ();
   obstack_free (&gcse_obstack, NULL);
Index: gcc/loop-invariant.c
===================================================================
--- gcc/loop-invariant.c	(revision 192194)
+++ gcc/loop-invariant.c	(working copy)
@@ -1,5 +1,5 @@ 
 /* RTL-level loop invariant motion.
-   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2012
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -1915,7 +1915,7 @@  move_loop_invariants (void)
     {
       df_analyze ();
       regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (dump_file);
+      ira_set_pseudo_classes (true, dump_file);
       calculate_loop_reg_pressure ();
       regstat_free_n_sets_and_refs ();
     }
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 192194)
+++ gcc/common.opt	(working copy)
@@ -1392,6 +1392,11 @@  Enum(ira_region) String(all) Value(IRA_REGION_ALL)
 EnumValue
 Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
 
+fira-hoist-pressure
+Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization
+Use IRA based register pressure calculation
+in RTL hoist optimizations.
+
 fira-loop-pressure
 Common Report Var(flag_ira_loop_pressure)
 Use IRA based register pressure calculation
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	(revision 192194)
+++ gcc/ira.c	(working copy)
@@ -4183,7 +4183,7 @@  ira (FILE *f)
   crtl->is_leaf = leaf_function_p ();
 
   if (resize_reg_info () && flag_ira_loop_pressure)
-    ira_set_pseudo_classes (ira_dump_file);
+    ira_set_pseudo_classes (true, ira_dump_file);
 
   rebuild_p = update_equiv_regs ();
 
Index: gcc/ira.h
===================================================================
--- gcc/ira.h	(revision 192194)
+++ gcc/ira.h	(working copy)
@@ -1,6 +1,6 @@ 
 /* Communication between the Integrated Register Allocator (IRA) and
    the rest of the compiler.
-   Copyright (C) 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2006, 2007, 2008, 2009, 2010, 2012
    Free Software Foundation, Inc.
    Contributed by Vladimir Makarov <vmakarov@redhat.com>.
 
@@ -131,7 +131,7 @@  extern void ira_init (void);
 extern void ira_finish_once (void);
 extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, enum machine_mode);
-extern void ira_set_pseudo_classes (FILE *);
+extern void ira_set_pseudo_classes (bool, FILE *);
 extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
 
 extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
Index: gcc/ira-costs.c
===================================================================
--- gcc/ira-costs.c	(revision 192194)
+++ gcc/ira-costs.c	(working copy)
@@ -2048,9 +2048,10 @@  ira_costs (void)
   ira_free (total_allocno_costs);
 }
 
-/* Entry function which defines classes for pseudos.  */
+/* Entry function which defines classes for pseudos.
+   Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true.  */
 void
-ira_set_pseudo_classes (FILE *dump_file)
+ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file)
 {
   allocno_p = false;
   internal_flag_ira_verbose = flag_ira_verbose;
@@ -2059,7 +2060,9 @@  void
   initiate_regno_cost_classes ();
   find_costs_and_classes (dump_file);
   finish_regno_cost_classes ();
-  pseudo_classes_defined_p = true;
+  if (define_pseudo_classes)
+    pseudo_classes_defined_p = true;
+
   finish_costs ();
 }