Patchwork [RFA] Implement register pressure directed hoist pass

login
register
mail settings
Submitter Bin Cheng
Date Sept. 28, 2012, 7:18 a.m.
Message ID <00a001cd9d49$7a1eba60$6e5c2f20$@cheng@arm.com>
Download mbox | patch
Permalink /patch/187724/
State New
Headers show

Comments

Bin Cheng - Sept. 28, 2012, 7:18 a.m.
Hi,

This patch implements register pressure directed hoist pass. Basically it
calculates register pressure for each basic block and use that information
to determine the hoist distance of each candidate expression. The register
pressure is calculated by re-using IRA utilities.

I measured the benefit on Thumb1/Thumb2/ARM/x86/MIPS instruction sets and
targets. For CSiBE, it improves code size by more than 0.1% on thumb1/ARM
instruction set; it improves code size nearly 0.2% on MIPS, all with very
small regressions. Since the hoist itself improves code size by only about
0.1% on Thumb1 instruction set, this is considerable improvement.
Unfortunately this patch has no obvious effect on Thumb2 and X86, so
currently I enabled it on Thumb1 when optimizing for size. Other targets can
take advantage of it as necessary after upstream.

Apart from the change in hoist pass, this patch also changes prototype of
function ira_set_pseudo_classes in IRA. This change is to make IRA
re-calculate cost information by itself, rather than re-using the info
calculated by hoist pass, because hoist is an early pass and the information
cannot be used directly in IRA. You can refer to
http://gcc.gnu.org/ml/gcc/2012-08/msg00299.html for some discussion.

I bootstrap gcc x86_64 on trunk r190769, since the head revision when I was
working on this fails bootstrap that time. The results are:
	bootstrap                        time(real/user/sys)
	trunk/Os                         122m9s/118m30s/19m48s
	patched/Os                       122m20s/118m19s/19m52s
	patched/Os/fira-hoist-pressure   120m47s/119m9s/19m38s
It seems the patch has no obvious slowdown on gcc compilation time. I also
measured miscellaneous binaries generated like cc1/cc1plus, the code size of
text section has been improved by only 0.05% on x86_64, not as obvious as
ARM/MIPS(0.1-0.2%).

I ran regression test on cortex-m3/cortex-m0/X86 with Os and everything was
fine.

Is it ok for upstream?

Thanks

2012-09-28  Bin Cheng  <bin.cheng@arm.com>

	* common.opt (flag_ira_hoist_pressure): New.
	* doc/invoke.texi (-fira-hoist-pressure): Describe.
	* ira-costs.c (ira_set_pseudo_classes): New parameter.
	* ira.h (ira_set_pseudo_classes): Update prototype.
	* haifa-sched.c (sched_init): Update call.
	* ira.c (ira): Update call.
	* regmove.c (regmove_optimize): Update call.
	* loop-invariant.c (move_loop_invariants): Update call.
	* gcse.c (struct bb_data): New structure.
	(BB_DATA): New macro.
	(curr_bb, curr_regs_live, curr_reg_pressure, regs_set, n_regs_set):
New
	static variables.
	(hoist_expr_reaches_here_p): Use reg pressure to determin the
distance
	expr can be hoisted.
	(hoist_code): Use reg pressure to direct the hoist process.
	(get_regno_pressure_class, get_pressure_class_and_nregs)
	(change_pressure, mark_regno_live, mark_regno_death, mark_reg_death)
	(mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)
	(free_bb_data): New.
	(one_code_hoisting_pass): Calculate register pressure. Free data.
	* config/arm/arm.c (arm_option_override): Set
flag_ira_hoist_pressure
	on Thumb1 when optimizing for size.

Patch

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 191816)
+++ gcc/doc/invoke.texi	(working copy)
@@ -370,7 +370,7 @@  Objective-C and Objective-C++ Dialects}.
 -finline-small-functions -fipa-cp -fipa-cp-clone @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
 -fira-algorithm=@var{algorithm} @gol
--fira-region=@var{region} @gol
+-fira-region=@var{region} -fira-hoist-pressure @gol
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
@@ -6904,6 +6904,14 @@  This typically results in the smallest code size,
 
 @end table
 
+@item -fira-hoist-pressure
+@opindex fira-hoist-pressure
+Use IRA to evaluate register pressure in hoist pass for decisions to hoist
+expressions.  This option usually results in generation of smaller code on
+RISC machines, but it can slow the compiler down.
+
+This option is enabled at level @option{-Os} for some targets.
+
 @item -fira-loop-pressure
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decisions to move
Index: gcc/haifa-sched.c
===================================================================
--- gcc/haifa-sched.c	(revision 191816)
+++ gcc/haifa-sched.c	(working copy)
@@ -6629,7 +6629,7 @@  sched_init (void)
 	/* We need info about pseudos for rtl dumps about pseudo
 	   classes and costs.  */
 	regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
+      ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
       sched_regno_pressure_class
 	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
       for (i = 0; i < max_regno; i++)
Index: gcc/regmove.c
===================================================================
--- gcc/regmove.c	(revision 191816)
+++ gcc/regmove.c	(working copy)
@@ -1237,7 +1237,7 @@  regmove_optimize (void)
   regstat_compute_ri ();
 
   if (flag_ira_loop_pressure)
-    ira_set_pseudo_classes (dump_file);
+    ira_set_pseudo_classes (true, dump_file);
 
   regno_src_regno = XNEWVEC (int, nregs);
   for (i = nregs; --i >= 0; )
Index: gcc/gcse.c
===================================================================
--- gcc/gcse.c	(revision 191816)
+++ gcc/gcse.c	(working copy)
@@ -20,9 +20,9 @@  along with GCC; see the file COPYING3.  If not see
 
 /* TODO
    - reordering of memory allocation and freeing to be more space efficient
-   - do rough calc of how many regs are needed in each block, and a rough
-     calc of how many regs are available in each class and use that to
-     throttle back the code in cases where RTX_COST is minimal.
+   - simulate register pressure change of each basic block accurately during
+     hoist process. But I doubt the benefit since most expressions hoisted
+     are constant or address, which usually won't reduce register pressure.
 */
 
 /* References searched while implementing this.
@@ -141,11 +141,12 @@  along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "toplev.h"
 
+#include "hard-reg-set.h"
 #include "rtl.h"
 #include "tree.h"
 #include "tm_p.h"
 #include "regs.h"
-#include "hard-reg-set.h"
+#include "ira.h"
 #include "flags.h"
 #include "insn-config.h"
 #include "recog.h"
@@ -412,6 +413,33 @@  static bool doing_code_hoisting_p = false;
 /* For available exprs */
 static sbitmap *ae_kill;
 
+/* Data stored for each basic block.  */
+struct bb_data
+{
+  /* Maximal register pressure inside basic block for given register class
+     (defined only for the pressure classes).  */
+  int max_reg_pressure[N_REG_CLASSES];
+};
+
+#define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
+
+static basic_block curr_bb;
+
+/* Register currently living.  */
+static bitmap_head curr_regs_live;
+
+/* Currently register pressure for each pressure class.  */
+static int curr_reg_pressure[N_REG_CLASSES];
+
+/* Record all regs that are set in any one insn.  Communication from
+   mark_reg_{store,clobber} and global_conflicts.  Asm can refer to
+   all hard-registers.  */
+static rtx regs_set[(FIRST_PSEUDO_REGISTER > MAX_RECOG_OPERANDS
+		     ? FIRST_PSEUDO_REGISTER : MAX_RECOG_OPERANDS) * 2];
+/* Number of regs stored in the previous array.  */
+static int n_regs_set;
+
+
 static void compute_can_copy (void);
 static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
 static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
@@ -460,9 +488,11 @@  static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
-				      int, int *);
+static int hoist_expr_reaches_here_p (basic_block, struct expr*, basic_block,
+				      sbitmap, int, int *, enum reg_class,
+				      int *, bitmap_head *);
 static int hoist_code (void);
+static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs);
 static int one_code_hoisting_pass (void);
 static rtx process_insert_insn (struct expr *);
 static int pre_edge_insert (struct edge_list *, struct expr **);
@@ -2825,11 +2855,16 @@  compute_code_hoist_data (void)
     fprintf (dump_file, "\n");
 }
 
-/* Determine if the expression identified by EXPR_INDEX would
-   reach BB unimpared if it was placed at the end of EXPR_BB.
-   Stop the search if the expression would need to be moved more
-   than DISTANCE instructions.
+/* Determine if the expression EXPR would reach BB unimpared if it was
+   placed at the end of EXPR_BB. Stop the search if the expression would
+   need to be moved more than DISTANCE instructions.
 
+   PRESSURE_CLASS and NREGS are register class and number of hard registers
+   for storing EXPR. 
+
+   HOISTED_BBS points to a bitmap indicating basic blocks through which
+   EXPR is hoisted.
+
    It's unclear exactly what Muchnick meant by "unimpared".  It seems
    to me that the expression must either be computed or transparent in
    *every* block in the path(s) from EXPR_BB to BB.  Any other definition
@@ -2841,18 +2876,29 @@  compute_code_hoist_data (void)
    paths.  */
 
 static int
-hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
-			   char *visited, int distance, int *bb_size)
+hoist_expr_reaches_here_p (basic_block expr_bb, struct expr *expr,
+			   basic_block bb, sbitmap visited, int distance,
+			   int *bb_size, enum reg_class pressure_class,
+			   int *nregs, bitmap_head *hoisted_bbs)
 {
+  unsigned int i;
   edge pred;
   edge_iterator ei;
+  sbitmap_iterator sbi;
   int visited_allocated_locally = 0;
 
   /* Terminate the search if distance, for which EXPR is allowed to move,
      is exhausted.  */
   if (distance > 0)
     {
-      distance -= bb_size[bb->index];
+      /* Only decrease distance if bb has high register pressure or EXPR
+	 is const expr, otherwise EXPR can be hoisted through bb without
+	 cost.  */
+      if (flag_ira_hoist_pressure != 1
+	  || (BB_DATA (bb)->max_reg_pressure[pressure_class]
+		>= ira_class_hard_regs_num[pressure_class]
+	      || CONST_INT_P (expr->expr)))
+	distance -= bb_size[bb->index];
 
       if (distance <= 0)
 	return 0;
@@ -2863,7 +2909,8 @@  static int
   if (visited == NULL)
     {
       visited_allocated_locally = 1;
-      visited = XCNEWVEC (char, last_basic_block);
+      visited = sbitmap_alloc (last_basic_block);
+      sbitmap_zero (visited);
     }
 
   FOR_EACH_EDGE (pred, ei, bb->preds)
@@ -2874,23 +2921,37 @@  static int
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      else if (visited[pred_bb->index])
+      else if (TEST_BIT (visited, pred_bb->index))
 	continue;
-
-      else if (! TEST_BIT (transp[pred_bb->index], expr_index))
+      else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index))
 	break;
-
       /* Not killed.  */
       else
 	{
-	  visited[pred_bb->index] = 1;
-	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
-					   visited, distance, bb_size))
+	  SET_BIT (visited, pred_bb->index);
+	  if (! hoist_expr_reaches_here_p (expr_bb, expr, pred_bb,
+					   visited, distance, bb_size,
+					   pressure_class, nregs, hoisted_bbs))
 	    break;
 	}
     }
   if (visited_allocated_locally)
-    free (visited);
+    {
+      /* If EXPR can be hoisted to expr_bb, record basic blocks through
+	 which EXPR is hoisted in hoisted_bbs. Also update register
+	 pressure for basic blocks newly added in hoisted_bbs.  */
+      if ((flag_ira_hoist_pressure == 1) && !pred)
+	{
+	  EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi)
+	    if (!bitmap_bit_p (hoisted_bbs, i))
+	      {
+		bitmap_set_bit (hoisted_bbs, i);
+		BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class]
+		    += *nregs;
+	      }
+	}
+      sbitmap_free (visited);
+    }
 
   return (pred == NULL);
 }
@@ -2916,12 +2977,14 @@  hoist_code (void)
   VEC (basic_block, heap) *dom_tree_walk;
   unsigned int dom_tree_walk_index;
   VEC (basic_block, heap) *domby;
-  unsigned int i,j;
+  unsigned int i, j, k;
   struct expr **index_map;
   struct expr *expr;
   int *to_bb_head;
   int *bb_size;
   int changed = 0;
+  struct bb_data *data;
+  bitmap_iterator bi;
 
   /* Compute a mapping from expression number (`bitmap_index') to
      hash table entry.  */
@@ -2977,12 +3040,16 @@  hoist_code (void)
 	{
 	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
+	      int nregs = 0;
+	      enum reg_class pressure_class = NO_REGS;
 	      /* Current expression.  */
 	      struct expr *expr = index_map[i];
 	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
 	      int hoistable = 0;
 	      /* Basic blocks that have occurrences reachable from BB.  */
 	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
+	      /* Basic blocks through which expr is hoisted.  */
+	      bitmap_head _hoisted_bbs, *hoisted_bbs = &_hoisted_bbs;
 	      /* Occurrences reachable from BB.  */
 	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
 	      /* We want to insert the expression into BB only once, so
@@ -2991,6 +3058,8 @@  hoist_code (void)
 	      occr_t occr;
 
 	      bitmap_initialize (from_bbs, 0);
+	      if (flag_ira_hoist_pressure == 1)
+		bitmap_initialize (hoisted_bbs, 0);
 
 	      /* If an expression is computed in BB and is available at end of
 		 BB, hoist all occurrences dominated by BB to BB.  */
@@ -3045,13 +3114,17 @@  hoist_code (void)
 		    max_distance += (bb_size[dominated->index]
 				     - to_bb_head[INSN_UID (occr->insn)]);
 
+		  pressure_class = get_pressure_class_and_nregs (occr->insn,
+								 &nregs);
+
 		  /* Note if the expression would reach the dominated block
 		     unimpared if it was placed at the end of BB.
 
 		     Keep track of how many times this expression is hoistable
 		     from a dominated block into BB.  */
-		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
-						 max_distance, bb_size))
+		  if (hoist_expr_reaches_here_p (bb, expr, dominated, NULL,
+						 max_distance, bb_size, pressure_class,
+						 &nregs, hoisted_bbs))
 		    {
 		      hoistable++;
 		      VEC_safe_push (occr_t, heap,
@@ -3072,6 +3145,13 @@  hoist_code (void)
 		 to nullify any benefit we get from code hoisting.  */
 	      if (hoistable > 1 && dbg_cnt (hoist_insn))
 		{
+		  /* Update register pressure for basic block to which expr
+		     is hoisted.  */
+		  if (flag_ira_hoist_pressure == 1)
+		    {
+		      data = BB_DATA (bb);
+		      data->max_reg_pressure[pressure_class] += nregs;
+		    }
 		  /* If (hoistable != VEC_length), then there is
 		     an occurrence of EXPR in BB itself.  Don't waste
 		     time looking for LCA in this case.  */
@@ -3089,8 +3169,20 @@  hoist_code (void)
 		    }
 		}
 	      else
+	      {
 		/* Punt, no point hoisting a single occurence.  */
 		VEC_free (occr_t, heap, occrs_to_hoist);
+		/* Restore register pressure of basic block recorded in
+ 		   hoisted_bbs when expr will not be hoisted.  */
+		if (flag_ira_hoist_pressure == 1)
+		  EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi)
+		    {
+		      data = BB_DATA (BASIC_BLOCK (k));
+		      data->max_reg_pressure[pressure_class] -= nregs;
+		    }
+	      }
+	      if (flag_ira_hoist_pressure == 1)
+		bitmap_clear (hoisted_bbs);
 
 	      insn_inserted_p = 0;
 
@@ -3147,6 +3239,265 @@  hoist_code (void)
   return changed;
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+static enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = reg_allocno_class (regno);
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+      return pressure_class;
+    }
+  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
+    {
+      *nregs = 1;
+      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+    }
+  else
+    {
+      *nregs = 0;
+      return NO_REGS;
+    }
+}
+
+/* Return pressure class and number of hard registers (through *NREGS)
+   for destination of INSN. */
+static enum reg_class
+get_pressure_class_and_nregs (rtx insn, int *nregs)
+{
+  rtx reg;
+  enum reg_class pressure_class;
+  rtx set = single_set (insn);
+
+  /* Considered invariant insns have only one set.  */
+  gcc_assert (set != NULL_RTX);
+  reg = SET_DEST (set);
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (MEM_P (reg))
+    {
+      *nregs = 0;
+      pressure_class = NO_REGS;
+    }
+  else
+    {
+      if (! REG_P (reg))
+	reg = NULL_RTX;
+      if (reg == NULL_RTX)
+	pressure_class = GENERAL_REGS;
+      else
+	{
+	  pressure_class = reg_allocno_class (REGNO (reg));
+	  pressure_class = ira_pressure_class_translate[pressure_class];
+	}
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))];
+    }
+  return pressure_class;
+}
+
+/* Increase (if INCR_P) or decrease current register pressure for
+   register REGNO.  */
+static void
+change_pressure (int regno, bool incr_p)
+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  pressure_class = get_regno_pressure_class (regno, &nregs);
+  if (! incr_p)
+    curr_reg_pressure[pressure_class] -= nregs;
+  else
+    {
+      curr_reg_pressure[pressure_class] += nregs;
+      if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  < curr_reg_pressure[pressure_class])
+	BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  = curr_reg_pressure[pressure_class];
+    }
+}
+
+/* Mark REGNO birth.  */
+static void
+mark_regno_live (int regno)
+{
+  if (!bitmap_set_bit (&curr_regs_live, regno))
+    return;
+  change_pressure (regno, true);
+}
+
+/* Mark REGNO death.  */
+static void
+mark_regno_death (int regno)
+{
+  if (! bitmap_clear_bit (&curr_regs_live, regno))
+    return;
+  change_pressure (regno, false);
+}
+
+/* Mark register REG death.  */
+static void
+mark_reg_death (rtx reg)
+{
+  int regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_death (regno);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_death (regno);
+	  regno++;
+	}
+    }
+}
+
+/* Mark setting register REG.  */
+static void
+mark_reg_store (rtx reg, const_rtx setter ATTRIBUTE_UNUSED,
+		void *data ATTRIBUTE_UNUSED)
+{
+  int regno;
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+
+  if (! REG_P (reg))
+    return;
+
+  regs_set[n_regs_set++] = reg;
+
+  regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_live (regno);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_live (regno);
+	  regno++;
+	}
+    }
+}
+
+/* Mark clobbering register REG.  */
+static void
+mark_reg_clobber (rtx reg, const_rtx setter, void *data)
+{
+  if (GET_CODE (setter) == CLOBBER)
+    mark_reg_store (reg, setter, data);
+}
+
+/* Calculate register pressure of each basic block.  */
+static void
+calculate_bb_reg_pressure (void)
+{
+  int i;
+  unsigned int j;
+  bitmap_iterator bi;
+  basic_block bb;
+  rtx insn, link;
+
+  ira_setup_eliminable_regset ();
+  bitmap_initialize (&curr_regs_live, &reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      curr_bb = bb;
+      bb->aux = xcalloc (1, sizeof (struct bb_data));
+      bitmap_copy (&curr_regs_live, DF_LR_IN (bb));
+      for (i = 0; i < ira_pressure_classes_num; i++)
+	curr_reg_pressure[ira_pressure_classes[i]] = 0;
+      EXECUTE_IF_SET_IN_BITMAP (&curr_regs_live, 0, j, bi)
+	change_pressure (j, true);
+
+      FOR_BB_INSNS (bb, insn)
+	{
+	  if (! NONDEBUG_INSN_P (insn))
+	    continue;
+
+	  n_regs_set = 0;
+	  note_stores (PATTERN (insn), mark_reg_clobber, NULL);
+
+	  /* Mark any registers dead after INSN as dead now.  */
+
+	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+	    if (REG_NOTE_KIND (link) == REG_DEAD)
+	      mark_reg_death (XEXP (link, 0));
+
+	  /* Mark any registers set in INSN as live,
+	     and mark them as conflicting with all other live regs.
+	     Clobbers are processed again, so they conflict with
+	     the registers that are set.  */
+
+	  note_stores (PATTERN (insn), mark_reg_store, NULL);
+
+#ifdef AUTO_INC_DEC
+	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+	    if (REG_NOTE_KIND (link) == REG_INC)
+	      mark_reg_store (XEXP (link, 0), NULL_RTX, NULL);
+#endif
+	  while (n_regs_set-- > 0)
+	    {
+	      rtx note = find_regno_note (insn, REG_UNUSED,
+					  REGNO (regs_set[n_regs_set]));
+	      if (! note)
+		continue;
+
+	      mark_reg_death (XEXP (note, 0));
+	    }
+	}
+    }
+  bitmap_clear (&curr_regs_live);
+
+  if (dump_file == NULL)
+    return;
+  FOR_EACH_BB (bb)
+    {
+      fprintf (dump_file, "\n  Basic block %d Pressure: \n", bb->index);
+      for (i = 0; (int) i < ira_pressure_classes_num; i++)
+	{
+	  enum reg_class pressure_class;
+
+	  pressure_class = ira_pressure_classes[i];
+	  if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
+	    continue;
+	  fprintf (dump_file, " %s=%d", reg_class_names[pressure_class],
+		   BB_DATA (bb)->max_reg_pressure[pressure_class]);
+	}
+      fprintf (dump_file, "\n");
+    }
+}
+
+/* Free struct bb_data in each basic block.  */
+static void
+free_bb_data (void)
+{
+  basic_block bb;
+
+  FOR_EACH_BB (bb)
+    {
+      struct bb_data *data = BB_DATA (bb);
+      if (!data)
+	continue;
+
+      free (data);
+      bb->aux = NULL;
+    }
+}
+
 /* Top level routine to perform one code hoisting (aka unification) pass
 
    Return nonzero if a change was made.  */
@@ -3166,6 +3517,15 @@  one_code_hoisting_pass (void)
 
   doing_code_hoisting_p = true;
 
+  /* Calculate register pressure for each basic block.  */
+  if (flag_ira_hoist_pressure == 1)
+    {
+      regstat_init_n_sets_and_refs ();
+      ira_set_pseudo_classes (false, dump_file);
+      calculate_bb_reg_pressure ();
+      regstat_free_n_sets_and_refs ();
+    }
+
   /* We need alias.  */
   init_alias_analysis ();
 
@@ -3186,6 +3546,11 @@  one_code_hoisting_pass (void)
       free_code_hoist_mem ();
     }
 
+  if (flag_ira_hoist_pressure == 1)
+    {
+      free_bb_data ();
+      free_reg_info ();
+    }
   free_hash_table (&expr_hash_table);
   free_gcse_mem ();
   obstack_free (&gcse_obstack, NULL);
Index: gcc/loop-invariant.c
===================================================================
--- gcc/loop-invariant.c	(revision 191816)
+++ gcc/loop-invariant.c	(working copy)
@@ -1915,7 +1915,7 @@  move_loop_invariants (void)
     {
       df_analyze ();
       regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (dump_file);
+      ira_set_pseudo_classes (true, dump_file);
       calculate_loop_reg_pressure ();
       regstat_free_n_sets_and_refs ();
     }
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 191816)
+++ gcc/common.opt	(working copy)
@@ -1395,6 +1395,11 @@  Enum(ira_region) String(all) Value(IRA_REGION_ALL)
 EnumValue
 Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
 
+fira-hoist-pressure
+Common Report Var(flag_ira_hoist_pressure) Init(-1)
+Use IRA based register pressure calculation
+in hoist optimizations.
+
 fira-loop-pressure
 Common Report Var(flag_ira_loop_pressure)
 Use IRA based register pressure calculation
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	(revision 191816)
+++ gcc/ira.c	(working copy)
@@ -4164,7 +4164,7 @@  ira (FILE *f)
   crtl->is_leaf = leaf_function_p ();
 
   if (resize_reg_info () && flag_ira_loop_pressure)
-    ira_set_pseudo_classes (ira_dump_file);
+    ira_set_pseudo_classes (true, ira_dump_file);
 
   rebuild_p = update_equiv_regs ();
 
Index: gcc/ira.h
===================================================================
--- gcc/ira.h	(revision 191816)
+++ gcc/ira.h	(working copy)
@@ -125,7 +125,7 @@  extern void ira_init (void);
 extern void ira_finish_once (void);
 extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, enum machine_mode);
-extern void ira_set_pseudo_classes (FILE *);
+extern void ira_set_pseudo_classes (bool, FILE *);
 extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
 
 extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
Index: gcc/ira-costs.c
===================================================================
--- gcc/ira-costs.c	(revision 191816)
+++ gcc/ira-costs.c	(working copy)
@@ -2048,9 +2048,10 @@  ira_costs (void)
   ira_free (total_allocno_costs);
 }
 
-/* Entry function which defines classes for pseudos.  */
+/* Entry function which defines classes for pseudos.
+   Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true.  */
 void
-ira_set_pseudo_classes (FILE *dump_file)
+ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file)
 {
   allocno_p = false;
   internal_flag_ira_verbose = flag_ira_verbose;
@@ -2059,7 +2060,9 @@  void
   initiate_regno_cost_classes ();
   find_costs_and_classes (dump_file);
   finish_regno_cost_classes ();
-  pseudo_classes_defined_p = true;
+  if (define_pseudo_classes)
+    pseudo_classes_defined_p = true;
+
   finish_costs ();
 }
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 191816)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2021,6 +2021,11 @@  arm_option_override (void)
       && current_tune->num_prefetch_slots > 0)
     flag_prefetch_loop_arrays = 1;
 
+  /* Enable register pressure hoist when optimizing for size on Thumb1 set.  */
+  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
+      && flag_ira_hoist_pressure == -1)
+    flag_ira_hoist_pressure = 1;
+
   /* Set up parameters to be used in prefetching algorithm.  Do not override the
      defaults unless we are tuning for a core we have researched values for.  */
   if (current_tune->num_prefetch_slots > 0)