Patchwork [ira-improv] patch for new code dealing with hard reg preferences

login
register
mail settings
Submitter Vladimir Makarov
Date April 5, 2013, 8:14 p.m.
Message ID <515F30C2.2090205@redhat.com>
Download mbox | patch
Permalink /patch/234270/
State New
Headers show

Comments

Vladimir Makarov - April 5, 2013, 8:14 p.m.
The following patch adds a new functionality to IRA to improve RA in
presence of hard registers in RTL.  IRA already had some mechanism
dealing with hard regs but it affected only pseudos in *the same insn*
containing hard register.  This technique was not good enough to
remove regmove pass code making transformations for matching
constraints (e.g. 2-op insn machine) although IRA can make better
decision for pseudos but still not enough good for hard registers.  So
just removing the code in regmove resulted in worse generated code
performance.

   On the other hand there are performance PRs which is a result of bad
decision in pre-mature optimization of regmove pass for matching
constraints.

   We already have an IRA code propagating pseudo preferences to other
pseudos through net of copies.  We need the same code for better
dealing with hard register preferences.  The patch adds such code.  I
could modify structure ira_copy for this but decided to use new
separate structures as the copy structure is too big and we don't need
most of its fields for hard-register preferences.

   By the way, LRA has already code for dealing with hard register
preferences but in simplified way as we work on RTL mostly locally.

   This is not a final patch as the big part of regmove code and old
IRA code for hard register preferences are not removed.  This code is
just switched off by -fira-hard-reg-pref option.  I am going to play a
bit more with the new code and when I decide to submit it to trunk, I
remove the unnecessary code and the option.

   Right now, the results look pretty good for SPEC2000 on x86/x86-64.
The compiler is always faster (as a regmove RTL pass is switched
off), generates in average smaller code (the best is 0.05% decrease
for x86-64 SPECFP2000), and generates in average better code (0.6%
and 0.4% SPECFP2000 improvement on x86 and x86-64 correspondingly).
The results were gotten on Intel Core I7-2600 in -O3 optimization
mode.

   The patch was successfully bootstrapped on x86/x86-64 with the new
code switched on.

Committed as rev. 197525.


2013-04-05  Vladimir Makarov  <vmakarov@redhat.com>

         * common.opt (fira-hard-reg-pref): New.
         * regmove.c (regmove_optimize): Don't call regmove_forward_pass
         for flag_ira_hard_reg_pref.
         * ira-int.h (ira_pref_t, allocno_prefs, ALLOCNO_PREFS): New.
         (struct ira_allocno_pref, ira_prefs, ira_prefs_num): New.
         (ira_debug_pref, ira_debug_prefs, ira_debug_allocno_prefs): New.
         (ira_create_pref, ira_create_copy): New.
         (ira_add_allocno_copy_to_list): Remove.
         (ira_swap_allocno_copy_ends_if_necessary): Ditto.
         (ira_pref_iterator, ira_pref_iter_init, ira_pref_iter_cond): New.
         (FOR_EACH_PREF): New.
         * ira-build.c (ira_prefs, ira_prefs_num): New.
         (ira_create_allocno): Reset preferences.
         (pref_pool, pref_vec, initiate_prefs, find_allocno_pref)
         (ira_create_pref, add_allocno_pref_to_list, ira_add_allocno_pref)
         (print_pref, ira_debug_pref, print_prefs, ira_debug_prefs)
         (print_allocno_prefs, ira_debug_allocno_prefs, finish_pref)
         (finish_prefs): New.
         (ira_add_allocno_copy_to_list): Rename to
         add_allocno_copy_to_list.  Make static.
         (ira_swap_allocno_copy_ends_if_necessary): Rename to
         swap_allocno_copy_ends_if_necessary.  Make static.
         (ira_build, ira_destroy): Initialize and finish the prefs.
         * ira-color.c (update_allocno_cost, update_costs_from_allocno):
         New.
         (update_copy_costs): Rename to update_costs_from_copies.  Use
         update_costs_from_allocno.
         (update_costs_from_prefs, update_costs_from_copies): New.
         (assign_hard_reg): Call update_costs_from_prefs.
         (color_allocnos, color_pass): Ditto.
         * ira-costs.c (find_costs_and_classes): Improve code.  Add code for
         hard reg preferences.
         (process_bb_node_for_hard_reg_moves): Add code for hard reg
         preferences.
Jeff Law - April 5, 2013, 8:48 p.m.
On 04/05/2013 02:14 PM, Vladimir Makarov wrote:
>    The following patch adds a new functionality to IRA to improve RA in
> presence of hard registers in RTL.  IRA already had some mechanism
> dealing with hard regs but it affected only pseudos in *the same insn*
> containing hard register.  This technique was not good enough to
> remove regmove pass code making transformations for matching
> constraints (e.g. 2-op insn machine) although IRA can make better
> decision for pseudos but still not enough good for hard registers.  So
> just removing the code in regmove resulted in worse generated code
> performance.
>
>    On the other hand there are performance PRs which is a result of bad
> decision in pre-mature optimization of regmove pass for matching
> constraints.
>
>    We already have an IRA code propagating pseudo preferences to other
> pseudos through net of copies.  We need the same code for better
> dealing with hard register preferences.  The patch adds such code.  I
> could modify structure ira_copy for this but decided to use new
> separate structures as the copy structure is too big and we don't need
> most of its fields for hard-register preferences.
>
>    By the way, LRA has already code for dealing with hard register
> preferences but in simplified way as we work on RTL mostly locally.
>
>    This is not a final patch as the big part of regmove code and old
> IRA code for hard register preferences are not removed.  This code is
> just switched off by -fira-hard-reg-pref option.  I am going to play a
> bit more with the new code and when I decide to submit it to trunk, I
> remove the unnecessary code and the option.
>
>    Right now, the results look pretty good for SPEC2000 on x86/x86-64.
> The compiler is always faster (as a regmove RTL pass is switched
> off), generates in average smaller code (the best is 0.05% decrease
> for x86-64 SPECFP2000), and generates in average better code (0.6%
> and 0.4% SPECFP2000 improvement on x86 and x86-64 correspondingly).
> The results were gotten on Intel Core I7-2600 in -O3 optimization
> mode.
>
>    The patch was successfully bootstrapped on x86/x86-64 with the new
> code switched on.
>
> Committed as rev. 197525.
Very cool.  Presumably this is meant to kill optimize_reg_copy_*. 
That'd leave the fixup_match bits (which might be dead with the SLSR in 
the mainline.  Does that just leave try_auto_increment?

Jeff
Vladimir Makarov - April 5, 2013, 10:14 p.m.
On 13-04-05 4:48 PM, Jeff Law wrote:
> On 04/05/2013 02:14 PM, Vladimir Makarov wrote:
>>    The following patch adds a new functionality to IRA to improve RA in
>> presence of hard registers in RTL.  IRA already had some mechanism
>> dealing with hard regs but it affected only pseudos in *the same insn*
>> containing hard register.  This technique was not good enough to
>> remove regmove pass code making transformations for matching
>> constraints (e.g. 2-op insn machine) although IRA can make better
>> decision for pseudos but still not enough good for hard registers.  So
>> just removing the code in regmove resulted in worse generated code
>> performance.
>>
>>    On the other hand there are performance PRs which is a result of bad
>> decision in pre-mature optimization of regmove pass for matching
>> constraints.
>>
>>    We already have an IRA code propagating pseudo preferences to other
>> pseudos through net of copies.  We need the same code for better
>> dealing with hard register preferences.  The patch adds such code.  I
>> could modify structure ira_copy for this but decided to use new
>> separate structures as the copy structure is too big and we don't need
>> most of its fields for hard-register preferences.
>>
>>    By the way, LRA has already code for dealing with hard register
>> preferences but in simplified way as we work on RTL mostly locally.
>>
>>    This is not a final patch as the big part of regmove code and old
>> IRA code for hard register preferences are not removed.  This code is
>> just switched off by -fira-hard-reg-pref option.  I am going to play a
>> bit more with the new code and when I decide to submit it to trunk, I
>> remove the unnecessary code and the option.
>>
>>    Right now, the results look pretty good for SPEC2000 on x86/x86-64.
>> The compiler is always faster (as a regmove RTL pass is switched
>> off), generates in average smaller code (the best is 0.05% decrease
>> for x86-64 SPECFP2000), and generates in average better code (0.6%
>> and 0.4% SPECFP2000 improvement on x86 and x86-64 correspondingly).
>> The results were gotten on Intel Core I7-2600 in -O3 optimization
>> mode.
>>
>>    The patch was successfully bootstrapped on x86/x86-64 with the new
>> code switched on.
>>
>> Committed as rev. 197525.
> Very cool.  Presumably this is meant to kill optimize_reg_copy_*. 
> That'd leave the fixup_match bits (which might be dead with the SLSR 
> in the mainline.  Does that just leave try_auto_increment?
>
Yes, I think so.  Unfortunately, no IRA/LRA has analogous code.  As I 
remember this code still important for targets with small displacements 
and even for x86/x86-64: although it has no practically constraints.on 
displacements, it affects x86/x86-64 code size and as a consequence the 
performance too (through better code locality).

Still removing one pass through RTL is a good result although removing 2 
passes would be better.

Patch

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 195439)
+++ ChangeLog	(working copy)
@@ -1,3 +1,40 @@ 
+2013-04-05  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* common.opt (fira-hard-reg-pref): New.
+	* regmove.c (regmove_optimize): Don't call regmove_forward_pass
+	for flag_ira_hard_reg_pref.
+	* ira-int.h (ira_pref_t, allocno_prefs, ALLOCNO_PREFS): New.
+	(struct ira_allocno_pref, ira_prefs, ira_prefs_num): New.
+	(ira_debug_pref, ira_debug_prefs, ira_debug_allocno_prefs): New.
+	(ira_create_pref, ira_create_copy): New.
+	(ira_add_allocno_copy_to_list): Remove.
+	(ira_swap_allocno_copy_ends_if_necessary): Ditto.
+	(ira_pref_iterator, ira_pref_iter_init, ira_pref_iter_cond): New.
+	(FOR_EACH_PREF): New.
+	* ira-build.c (ira_prefs, ira_prefs_num): New.
+	(ira_create_allocno): Reset preferences.
+	(pref_pool, pref_vec, initiate_prefs, find_allocno_pref)
+	(ira_create_pref, add_allocno_pref_to_list, ira_add_allocno_pref)
+	(print_pref, ira_debug_pref, print_prefs, ira_debug_prefs)
+	(print_allocno_prefs, ira_debug_allocno_prefs, finish_pref)
+	(finish_prefs): New.
+	(ira_add_allocno_copy_to_list): Rename to
+	add_allocno_copy_to_list.  Make static.
+	(ira_swap_allocno_copy_ends_if_necessary): Rename to
+	swap_allocno_copy_ends_if_necessary.  Make static.
+	(ira_build, ira_destroy): Initialize and finish the prefs.
+	* ira-color.c (update_allocno_cost, update_costs_from_allocno):
+	New.
+	(update_copy_costs): Rename to update_costs_from_copies.  Use
+	update_costs_from_allocno.
+	(update_costs_from_prefs, update_costs_from_copies): New.
+	(assign_hard_reg): Call update_costs_from_prefs.
+	(color_allocnos, color_pass): Ditto.
+	* ira-costs.c (find_costs_and_classes): Improve code.  Add code for
+	hard reg preferences.
+	(process_bb_node_for_hard_reg_moves): Add code for hard reg
+	preferences.
+
 2013-01-24  Shenghou Ma  <minux.ma@gmail.com>
 
 	* doc/invoke.texi: fix typo.
Index: common.opt
===================================================================
--- common.opt	(revision 195438)
+++ common.opt	(working copy)
@@ -1470,6 +1470,10 @@ 
 Common Report Var(flag_errno_math) Init(1) Optimization SetByCombined
 Set errno after built-in math functions
 
+fira-hard-reg-pref
+Common Report Var(flag_ira_hard_reg_pref) Init(1) Optimization
+Use new IRA code for hard reg preference and switch of remove constraint matching code
+
 fmax-errors=
 Common Joined RejectNegative UInteger Var(flag_max_errors)
 -fmax-errors=<number>	Maximum number of errors to report
Index: ira-build.c
===================================================================
--- ira-build.c	(revision 195438)
+++ ira-build.c	(working copy)
@@ -76,6 +76,13 @@ 
 /* Map a conflict id to its conflict record.  */
 ira_object_t *ira_object_id_map;
 
+/* Array of references to all allocno preferences.  The order number
+   of the preference corresponds to the index in the array.  */
+ira_pref_t *ira_prefs;
+
+/* Size of the previous array.  */
+int ira_prefs_num;
+
 /* Array of references to all copies.  The order number of the copy
    corresponds to the index in the array.  Removed copies have NULL
    element value.  */
@@ -514,6 +521,7 @@ 
   ALLOCNO_BAD_SPILL_P (a) = false;
   ALLOCNO_ASSIGNED_P (a) = false;
   ALLOCNO_MODE (a) = (regno < 0 ? VOIDmode : PSEUDO_REGNO_MODE (regno));
+  ALLOCNO_PREFS (a) = NULL;
   ALLOCNO_COPIES (a) = NULL;
   ALLOCNO_HARD_REG_COSTS (a) = NULL;
   ALLOCNO_CONFLICT_HARD_REG_COSTS (a) = NULL;
@@ -1162,6 +1170,158 @@ 
 
 
 
+/* Pools for allocno preferences.  */
+static alloc_pool pref_pool;
+
+/* Vec containing references to all created preferences.  It is a
+   container of array ira_prefs.  */
+static vec<ira_pref_t> pref_vec;
+
+/* The function initializes data concerning allocno prefs.  */
+static void
+initiate_prefs (void)
+{
+  pref_pool
+    = create_alloc_pool ("prefs", sizeof (struct ira_allocno_pref), 100);
+  pref_vec.create (get_max_uid ());
+  ira_prefs = NULL;
+  ira_prefs_num = 0;
+}
+
+/* Return pref for A and HARD_REGNO if any.  */
+static ira_pref_t
+find_allocno_pref (ira_allocno_t a, int hard_regno)
+{
+  ira_pref_t pref;
+
+  for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref)
+    if (pref->allocno == a && pref->hard_regno == hard_regno)
+      return pref;
+  return NULL;
+}
+
+/* Create and return pref with given attributes A, HARD_REGNO, and FREQ.  */
+ira_pref_t
+ira_create_pref (ira_allocno_t a, int hard_regno, int freq)
+{
+  ira_pref_t pref;
+
+  pref = (ira_pref_t) pool_alloc (pref_pool);
+  pref->num = ira_prefs_num;
+  pref->allocno = a;
+  pref->hard_regno = hard_regno;
+  pref->freq = freq;
+  pref_vec.safe_push (pref);
+  ira_prefs = pref_vec.address ();
+  ira_prefs_num = pref_vec.length ();
+  return pref;
+}
+
+/* Attach a pref PREF to the cooresponding allocno.  */
+static void
+add_allocno_pref_to_list (ira_pref_t pref)
+{
+  ira_allocno_t a = pref->allocno;
+
+  pref->next_pref = ALLOCNO_PREFS (a);
+  ALLOCNO_PREFS (a) = pref;
+}
+
+/* Create (or update frequency if the pref already exists) the pref of
+   allocnos A preferring HARD_REGNO with frequency FREQ.  */
+void
+ira_add_allocno_pref (ira_allocno_t a, int hard_regno, int freq)
+{
+  ira_pref_t pref;
+
+  if (freq <= 0)
+    return;
+  if ((pref = find_allocno_pref (a, hard_regno)) != NULL)
+    {
+      pref->freq += freq;
+      return;
+    }
+  pref = ira_create_pref (a, hard_regno, freq);
+  ira_assert (a != NULL);
+  add_allocno_pref_to_list (pref);
+}
+
+/* Print info about PREF into file F.  */
+static void
+print_pref (FILE *f, ira_pref_t pref)
+{
+  fprintf (f, "  pref%d:a%d(r%d)<-hr%d@%d\n", pref->num,
+	   ALLOCNO_NUM (pref->allocno), ALLOCNO_REGNO (pref->allocno),
+	   pref->hard_regno, pref->freq);
+}
+
+/* Print info about PREF into stderr.  */
+void
+ira_debug_pref (ira_pref_t pref)
+{
+  print_pref (stderr, pref);
+}
+
+/* Print info about all prefs into file F.  */
+static void
+print_prefs (FILE *f)
+{
+  ira_pref_t pref;
+  ira_pref_iterator pi;
+
+  FOR_EACH_PREF (pref, pi)
+    print_pref (f, pref);
+}
+
+/* Print info about all prefs into stderr.  */
+void
+ira_debug_prefs (void)
+{
+  print_prefs (stderr);
+}
+
+/* Print info about prefs involving allocno A into file F.  */
+static void
+print_allocno_prefs (FILE *f, ira_allocno_t a)
+{
+  ira_pref_t pref;
+
+  fprintf (f, " a%d(r%d):", ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
+  for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref)
+    fprintf (f, " pref%d:hr%d@%d", pref->num, pref->hard_regno, pref->freq);
+  fprintf (f, "\n");
+}
+
+/* Print info about prefs involving allocno A into stderr.  */
+void
+ira_debug_allocno_prefs (ira_allocno_t a)
+{
+  print_allocno_prefs (stderr, a);
+}
+
+/* The function frees memory allocated for PREF.  */
+static void
+finish_pref (ira_pref_t pref)
+{
+  pool_free (pref_pool, pref);
+}
+
+
+/* Free memory allocated for all prefs.  */
+static void
+finish_prefs (void)
+{
+  ira_pref_t pref;
+  ira_pref_iterator pi;
+
+  FOR_EACH_PREF (pref, pi)
+    finish_pref (pref);
+  pref_vec.release ();
+  free_alloc_pool (pref_pool);
+}
+
+
+
 /* Pools for copies.  */
 static alloc_pool copy_pool;
 
@@ -1234,8 +1394,8 @@ 
 }
 
 /* Attach a copy CP to allocnos involved into the copy.  */
-void
-ira_add_allocno_copy_to_list (ira_copy_t cp)
+static void
+add_allocno_copy_to_list (ira_copy_t cp)
 {
   ira_allocno_t first = cp->first, second = cp->second;
 
@@ -1263,8 +1423,8 @@ 
 
 /* Make a copy CP a canonical copy where number of the
    first allocno is less than the second one.  */
-void
-ira_swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
+static void
+swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
 {
   ira_allocno_t temp;
   ira_copy_t temp_cp;
@@ -1304,8 +1464,8 @@ 
   cp = ira_create_copy (first, second, freq, constraint_p, insn,
 			loop_tree_node);
   ira_assert (first != NULL && second != NULL);
-  ira_add_allocno_copy_to_list (cp);
-  ira_swap_allocno_copy_ends_if_necessary (cp);
+  add_allocno_copy_to_list (cp);
+  swap_allocno_copy_ends_if_necessary (cp);
   return cp;
 }
 
@@ -3099,8 +3259,8 @@ 
       ira_assert
 	(ALLOCNO_LOOP_TREE_NODE (cp->first) == ira_loop_tree_root
 	 && ALLOCNO_LOOP_TREE_NODE (cp->second) == ira_loop_tree_root);
-      ira_add_allocno_copy_to_list (cp);
-      ira_swap_allocno_copy_ends_if_necessary (cp);
+      add_allocno_copy_to_list (cp);
+      swap_allocno_copy_ends_if_necessary (cp);
     }
   rebuild_regno_allocno_maps ();
   if (ira_max_point != ira_max_point_before_emit)
@@ -3188,6 +3348,7 @@ 
   df_analyze ();
   initiate_cost_vectors ();
   initiate_allocnos ();
+  initiate_prefs ();
   initiate_copies ();
   create_loop_tree_nodes ();
   form_loop_tree ();
@@ -3272,6 +3433,7 @@ 
 ira_destroy (void)
 {
   finish_loop_tree_nodes ();
+  finish_prefs ();
   finish_copies ();
   finish_allocnos ();
   finish_cost_vectors ();
Index: ira-color.c
===================================================================
--- ira-color.c	(revision 195438)
+++ ira-color.c	(working copy)
@@ -1219,29 +1219,44 @@ 
   return true;
 }
 
-/* Update the cost of allocnos to increase chances to remove some
-   copies as the result of subsequent assignment.  */
+/* Increase costs of HARD_REGNO by UPDATE_COST for ALLOCNO.  Return
+   true if we really modified the cost.  */
+static bool
+update_allocno_cost (ira_allocno_t allocno, int hard_regno, int update_cost)
+{
+  int i;
+  enum reg_class aclass = ALLOCNO_CLASS (allocno);
+
+  i = ira_class_hard_reg_index[aclass][hard_regno];
+  if (i < 0)
+    return false;
+  ira_allocate_and_set_or_copy_costs
+    (&ALLOCNO_UPDATED_HARD_REG_COSTS (allocno), aclass,
+     ALLOCNO_UPDATED_CLASS_COST (allocno),
+     ALLOCNO_HARD_REG_COSTS (allocno));
+  ira_allocate_and_set_or_copy_costs
+    (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (allocno),
+     aclass, 0, ALLOCNO_CONFLICT_HARD_REG_COSTS (allocno));
+  ALLOCNO_UPDATED_HARD_REG_COSTS (allocno)[i] += update_cost;
+  ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (allocno)[i]
+    += update_cost;
+  return true;
+}
+
+/* Update (decrease if DECR_P) HARD_REGNO cost of allocnos connected
+   by copies to ALLOCNO to increase chances to remove some copies as
+   the result of subsequent assignment.  */
 static void
-update_copy_costs (ira_allocno_t allocno, bool decr_p)
+update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
+			   int divisor, bool decr_p)
 {
-  int i, cost, update_cost, hard_regno, divisor;
+  int cost, update_cost;
   enum machine_mode mode;
   enum reg_class rclass, aclass;
   ira_allocno_t another_allocno;
   ira_copy_t cp, next_cp;
 
-  hard_regno = ALLOCNO_HARD_REGNO (allocno);
-  ira_assert (hard_regno >= 0);
-
-  aclass = ALLOCNO_CLASS (allocno);
-  if (aclass == NO_REGS)
-    return;
-  i = ira_class_hard_reg_index[aclass][hard_regno];
-  ira_assert (i >= 0);
   rclass = REGNO_REG_CLASS (hard_regno);
-
-  start_update_cost ();
-  divisor = 1;
   do
     {
       mode = ALLOCNO_MODE (allocno);
@@ -1277,26 +1292,54 @@ 
 	  if (update_cost == 0)
 	    continue;
 
-	  ira_allocate_and_set_or_copy_costs
-	    (&ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno), aclass,
-	     ALLOCNO_UPDATED_CLASS_COST (another_allocno),
-	     ALLOCNO_HARD_REG_COSTS (another_allocno));
-	  ira_allocate_and_set_or_copy_costs
-	    (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),
-	     aclass, 0, ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno));
-	  i = ira_class_hard_reg_index[aclass][hard_regno];
-	  if (i < 0)
+	  if (! update_allocno_cost (another_allocno, hard_regno, update_cost))
 	    continue;
-	  ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno)[i] += update_cost;
-	  ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i]
-	    += update_cost;
-
 	  queue_update_cost (another_allocno, divisor * COST_HOP_DIVISOR);
 	}
     }
   while (get_next_update_cost (&allocno, &divisor));
 }
 
+/* Update (decrease if DECR_P) preferred ALLOCNO hard register costs
+   and costs of allocnos connected to ALLOCNO through copy.  */
+static void
+update_costs_from_prefs (ira_allocno_t allocno, bool decr_p)
+{
+  int cost;
+  enum machine_mode mode;
+  enum reg_class rclass, aclass;
+  ira_pref_t pref;
+
+  mode = ALLOCNO_MODE (allocno);
+  aclass = ALLOCNO_CLASS (allocno);
+  start_update_cost ();
+  for (pref = ALLOCNO_PREFS (allocno); pref != NULL; pref = pref->next_pref)
+    {
+      rclass = REGNO_REG_CLASS (pref->hard_regno);
+      cost = pref->freq * ira_register_move_cost[mode][rclass][aclass];
+      if (decr_p)
+	cost = -cost;
+      if (update_allocno_cost (allocno, pref->hard_regno, cost))
+	update_costs_from_allocno (allocno, pref->hard_regno,
+				   COST_HOP_DIVISOR, decr_p);
+    }
+}
+
+/* Update (decrease if DECR_P) the cost of allocnos connected to
+   ALLOCNO through copies to increase chances to remove some copies as
+   the result of subsequent assignment.  ALLOCNO was just assigned to
+   a hard register.  */
+static void
+update_costs_from_copies (ira_allocno_t allocno, bool decr_p)
+{
+  int hard_regno;
+
+  hard_regno = ALLOCNO_HARD_REGNO (allocno);
+  ira_assert (hard_regno >= 0 && ALLOCNO_CLASS (allocno) != NO_REGS);
+  start_update_cost ();
+  update_costs_from_allocno (allocno, hard_regno, 1, decr_p);
+}
+
 /* This function updates COSTS (decrease if DECR_P) for hard_registers
    of ACLASS by conflict costs of the unassigned allocnos
    connected by copies with allocnos in update_cost_queue.  This
@@ -1711,7 +1754,8 @@ 
   ALLOCNO_HARD_REGNO (a) = best_hard_regno;
   ALLOCNO_ASSIGNED_P (a) = true;
   if (best_hard_regno >= 0)
-    update_copy_costs (a, true);
+    update_costs_from_copies (a, true);
+  update_costs_from_prefs (a, false);
   ira_assert (ALLOCNO_CLASS (a) == aclass);
   /* We don't need updated costs anymore: */
   ira_free_allocno_updated_costs (a);
@@ -2602,7 +2646,10 @@ 
 	{
 	  a = ira_allocnos[i];
 	  if (ALLOCNO_CLASS (a) != NO_REGS && ! empty_profitable_hard_regs (a))
-	    ALLOCNO_COLOR_DATA (a)->in_graph_p = true;
+	    {
+	      ALLOCNO_COLOR_DATA (a)->in_graph_p = true;
+	      update_costs_from_prefs (a, true);
+	    }
 	  else
 	    {
 	      ALLOCNO_HARD_REGNO (a) = -1;
@@ -2769,7 +2816,7 @@ 
 	    ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno;
 	    ALLOCNO_ASSIGNED_P (subloop_allocno) = true;
 	    if (hard_regno >= 0)
-	      update_copy_costs (subloop_allocno, true);
+	      update_costs_from_copies (subloop_allocno, true);
 	    /* We don't need updated costs anymore: */
 	    ira_free_allocno_updated_costs (subloop_allocno);
 	  }
@@ -2813,7 +2860,7 @@ 
 		  ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno;
 		  ALLOCNO_ASSIGNED_P (subloop_allocno) = true;
 		  if (hard_regno >= 0)
-		    update_copy_costs (subloop_allocno, true);
+		    update_costs_from_copies (subloop_allocno, true);
 		  /* We don't need updated costs anymore: */
 		  ira_free_allocno_updated_costs (subloop_allocno);
 		}
@@ -2829,7 +2876,7 @@ 
 		  ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno;
 		  ALLOCNO_ASSIGNED_P (subloop_allocno) = true;
 		  if (hard_regno >= 0)
-		    update_copy_costs (subloop_allocno, true);
+		    update_costs_from_copies (subloop_allocno, true);
 		  /* We don't need updated costs anymore: */
 		  ira_free_allocno_updated_costs (subloop_allocno);
 		}
@@ -3810,7 +3857,7 @@ 
 	       ? ALLOCNO_CLASS_COST (a)
 	       : ALLOCNO_HARD_REG_COSTS (a)
 	         [ira_class_hard_reg_index[aclass][old_hard_regno]]);
-      update_copy_costs (a, false);
+      update_costs_from_copies (a, false);
     }
   ira_overall_cost -= cost;
   ALLOCNO_HARD_REGNO (a) = hard_regno;
@@ -3825,7 +3872,7 @@ 
 	       ? ALLOCNO_CLASS_COST (a)
 	       : ALLOCNO_HARD_REG_COSTS (a)
 	         [ira_class_hard_reg_index[aclass][hard_regno]]);
-      update_copy_costs (a, true);
+      update_costs_from_copies (a, true);
     }
   else
     /* Reload changed class of the allocno.  */
Index: ira-costs.c
===================================================================
--- ira-costs.c	(revision 195438)
+++ ira-costs.c	(working copy)
@@ -1733,14 +1733,15 @@ 
 	       a != NULL;
 	       a = ALLOCNO_NEXT_REGNO_ALLOCNO (a))
 	    {
-	      a_num = ALLOCNO_NUM (a);
-	      if (regno_aclass[i] == NO_REGS)
+	      enum reg_class aclass = regno_aclass[i];
+	      int a_num = ALLOCNO_NUM (a);
+	      int *total_a_costs = COSTS (total_allocno_costs, a_num)->cost;
+	      int *a_costs = COSTS (costs, a_num)->cost;
+	
+	      if (aclass == NO_REGS)
 		best = NO_REGS;
 	      else
 		{
-		  int *total_a_costs = COSTS (total_allocno_costs, a_num)->cost;
-		  int *a_costs = COSTS (costs, a_num)->cost;
-		  
 		  /* Finding best class which is subset of the common
 		     class.  */
 		  best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
@@ -1749,7 +1750,7 @@ 
 		  for (k = 0; k < cost_classes_ptr->num; k++)
 		    {
 		      rclass = cost_classes[k];
-		      if (! ira_class_subset_p[rclass][regno_aclass[i]])
+		      if (! ira_class_subset_p[rclass][aclass])
 			continue;
 		      /* Ignore classes that are too small or invalid
 			 for this operand.  */
@@ -1784,9 +1785,25 @@ 
 			     ALLOCNO_LOOP_TREE_NODE (a)->loop_num);
 		  fprintf (dump_file, ") best %s, allocno %s\n",
 			   reg_class_names[best],
-			   reg_class_names[regno_aclass[i]]);
+			   reg_class_names[aclass]);
 		}
 	      pref[a_num] = best;
+	      if (flag_ira_hard_reg_pref && pass != start && best != aclass
+		  && ira_class_hard_regs_num[best] > 0
+		  && (ira_reg_class_max_nregs[best][ALLOCNO_MODE (a)]
+		      >= ira_class_hard_regs_num[best]))
+		{
+		  int ind = cost_classes_ptr->index[aclass];
+
+		  ira_assert (ind >= 0);
+		  ira_add_allocno_pref (a, ira_class_hard_regs[best][0],
+					(a_costs[ind] - ALLOCNO_CLASS_COST (a))
+					/ (ira_register_move_cost
+					   [ALLOCNO_MODE (a)][best][aclass]));
+		  for (k = 0; k < cost_classes_ptr->num; k++)
+		    if (ira_class_subset_p[cost_classes[k]][best])
+		      a_costs[k] = a_costs[ind];
+		}
 	    }
 	}
       
@@ -1812,11 +1829,10 @@ 
 static void
 process_bb_node_for_hard_reg_moves (ira_loop_tree_node_t loop_tree_node)
 {
-  int i, freq, cost, src_regno, dst_regno, hard_regno;
+  int i, freq, src_regno, dst_regno, hard_regno;
   bool to_p;
   ira_allocno_t a;
-  enum reg_class rclass, hard_reg_class;
-  enum machine_mode mode;
+  enum reg_class rclass;
   basic_block bb;
   rtx insn, set, src, dst;
 
@@ -1843,15 +1859,15 @@ 
 	  && src_regno < FIRST_PSEUDO_REGISTER)
 	{
 	  hard_regno = src_regno;
+	  a = ira_curr_regno_allocno_map[dst_regno];
 	  to_p = true;
-	  a = ira_curr_regno_allocno_map[dst_regno];
 	}
       else if (src_regno >= FIRST_PSEUDO_REGISTER
 	       && dst_regno < FIRST_PSEUDO_REGISTER)
 	{
 	  hard_regno = dst_regno;
+	  a = ira_curr_regno_allocno_map[src_regno];
 	  to_p = false;
-	  a = ira_curr_regno_allocno_map[src_regno];
 	}
       else
 	continue;
@@ -1861,20 +1877,29 @@ 
       i = ira_class_hard_reg_index[rclass][hard_regno];
       if (i < 0)
 	continue;
-      mode = ALLOCNO_MODE (a);
-      hard_reg_class = REGNO_REG_CLASS (hard_regno);
-      ira_init_register_move_cost_if_necessary (mode);
-      cost
-	= (to_p ? ira_register_move_cost[mode][hard_reg_class][rclass]
-	   : ira_register_move_cost[mode][rclass][hard_reg_class]) * freq;
-      ira_allocate_and_set_costs (&ALLOCNO_HARD_REG_COSTS (a), rclass,
-				  ALLOCNO_CLASS_COST (a));
-      ira_allocate_and_set_costs (&ALLOCNO_CONFLICT_HARD_REG_COSTS (a),
-				  rclass, 0);
-      ALLOCNO_HARD_REG_COSTS (a)[i] -= cost;
-      ALLOCNO_CONFLICT_HARD_REG_COSTS (a)[i] -= cost;
-      ALLOCNO_CLASS_COST (a) = MIN (ALLOCNO_CLASS_COST (a),
-				    ALLOCNO_HARD_REG_COSTS (a)[i]);
+      if (flag_ira_hard_reg_pref)
+	ira_add_allocno_pref (a, hard_regno, freq);
+      else
+	{
+	  int cost;
+	  enum reg_class hard_reg_class;
+	  enum machine_mode mode;
+
+	  mode = ALLOCNO_MODE (a);
+	  hard_reg_class = REGNO_REG_CLASS (hard_regno);
+	  ira_init_register_move_cost_if_necessary (mode);
+	  cost
+	    = (to_p ? ira_register_move_cost[mode][hard_reg_class][rclass]
+	       : ira_register_move_cost[mode][rclass][hard_reg_class]) * freq;
+	  ira_allocate_and_set_costs (&ALLOCNO_HARD_REG_COSTS (a), rclass,
+				      ALLOCNO_CLASS_COST (a));
+	  ira_allocate_and_set_costs (&ALLOCNO_CONFLICT_HARD_REG_COSTS (a),
+				      rclass, 0);
+	  ALLOCNO_HARD_REG_COSTS (a)[i] -= cost;
+	  ALLOCNO_CONFLICT_HARD_REG_COSTS (a)[i] -= cost;
+	  ALLOCNO_CLASS_COST (a) = MIN (ALLOCNO_CLASS_COST (a),
+					ALLOCNO_HARD_REG_COSTS (a)[i]);
+	}
     }
 }
 
Index: ira-int.h
===================================================================
--- ira-int.h	(revision 195438)
+++ ira-int.h	(working copy)
@@ -60,6 +60,7 @@ 
    allocnos.  */
 typedef struct live_range *live_range_t;
 typedef struct ira_allocno *ira_allocno_t;
+typedef struct ira_allocno_pref *ira_pref_t;
 typedef struct ira_allocno_copy *ira_copy_t;
 typedef struct ira_object *ira_object_t;
 
@@ -349,6 +350,8 @@ 
      register class living at the point than number of hard-registers
      of the class available for the allocation.  */
   int excess_pressure_points_num;
+  /* Allocno hard reg preferences.  */
+  ira_pref_t allocno_prefs;
   /* Copies to other non-conflicting allocnos.  The copies can
      represent move insn or potential move insn usually because of two
      operand insn constraints.  */
@@ -429,6 +432,7 @@ 
 #define ALLOCNO_BAD_SPILL_P(A) ((A)->bad_spill_p)
 #define ALLOCNO_ASSIGNED_P(A) ((A)->assigned_p)
 #define ALLOCNO_MODE(A) ((A)->mode)
+#define ALLOCNO_PREFS(A) ((A)->allocno_prefs)
 #define ALLOCNO_COPIES(A) ((A)->allocno_copies)
 #define ALLOCNO_HARD_REG_COSTS(A) ((A)->hard_reg_costs)
 #define ALLOCNO_UPDATED_HARD_REG_COSTS(A) ((A)->updated_hard_reg_costs)
@@ -519,6 +523,33 @@ 
 /* The size of the previous array.  */
 extern int ira_objects_num;
 
+/* The following structure represents a hard register prefererence of
+   allocno.  The preference represent move insns or potential move
+   insns usually because of two operand insn constraints.  One move
+   operand is a hard register.  */
+struct ira_allocno_pref
+{
+  /* The unique order number of the preference node starting with 0.  */
+  int num;
+  /* Preferred hard register.  */
+  int hard_regno;
+  /* Accumulated execution frequency of insns from which the
+     preference created.  */
+  int freq;
+  /* Given allocno.  */
+  ira_allocno_t allocno;
+  /* All prefernces with the same allocno are linked by the following
+     member.  */
+  ira_pref_t next_pref;
+};
+
+/* Array of references to all allocno preferences.  The order number
+   of the preference corresponds to the index in the array.  */
+extern ira_pref_t *ira_prefs;
+
+/* Size of the previous array.  */
+extern int ira_prefs_num;
+
 /* The following structure represents a copy of two allocnos.  The
    copies represent move insns or potential move insns usually because
    of two operand insn constraints.  To remove register shuffle, we
@@ -935,6 +966,10 @@ 
 extern ira_loop_tree_node_t ira_curr_loop_tree_node;
 extern ira_allocno_t *ira_curr_regno_allocno_map;
 
+extern void ira_debug_pref (ira_pref_t);
+extern void ira_debug_prefs (void);
+extern void ira_debug_allocno_prefs (ira_allocno_t);
+
 extern void ira_debug_copy (ira_copy_t);
 extern void ira_debug_copies (void);
 extern void ira_debug_allocno_copies (ira_allocno_t);
@@ -961,10 +996,10 @@ 
 extern void ira_finish_live_range (live_range_t);
 extern void ira_finish_live_range_list (live_range_t);
 extern void ira_free_allocno_updated_costs (ira_allocno_t);
+extern ira_pref_t ira_create_pref (ira_allocno_t, int, int);
+extern void ira_add_allocno_pref (ira_allocno_t, int, int);
 extern ira_copy_t ira_create_copy (ira_allocno_t, ira_allocno_t,
 				   int, bool, rtx, ira_loop_tree_node_t);
-extern void ira_add_allocno_copy_to_list (ira_copy_t);
-extern void ira_swap_allocno_copy_ends_if_necessary (ira_copy_t);
 extern ira_copy_t ira_add_allocno_copy (ira_allocno_t, ira_allocno_t, int,
 					bool, rtx, ira_loop_tree_node_t);
 
@@ -1147,6 +1182,44 @@ 
        ira_allocno_object_iter_cond (&(ITER), (A), &(O));)
 
 
+/* The iterator for prefs.  */
+typedef struct {
+  /* The number of the current element in IRA_PREFS.  */
+  int n;
+} ira_pref_iterator;
+
+/* Initialize the iterator I.  */
+static inline void
+ira_pref_iter_init (ira_pref_iterator *i)
+{
+  i->n = 0;
+}
+
+/* Return TRUE if we have more prefs to visit, in which case *PREF is
+   set to the pref to be visited.  Otherwise, return FALSE.  */
+static inline bool
+ira_pref_iter_cond (ira_pref_iterator *i, ira_pref_t *pref)
+{
+  int n;
+
+  for (n = i->n; n < ira_prefs_num; n++)
+    if (ira_prefs[n] != NULL)
+      {
+	*pref = ira_prefs[n];
+	i->n = n + 1;
+	return true;
+      }
+  return false;
+}
+
+/* Loop over all prefs.  In each iteration, P is set to the next
+   pref.  ITER is an instance of ira_pref_iterator used to iterate
+   the prefs.  */
+#define FOR_EACH_PREF(P, ITER)				\
+  for (ira_pref_iter_init (&(ITER));			\
+       ira_pref_iter_cond (&(ITER), &(P));)
+
+
 /* The iterator for copies.  */
 typedef struct {
   /* The number of the current element in IRA_COPIES.  */
Index: regmove.c
===================================================================
--- regmove.c	(revision 195438)
+++ regmove.c	(working copy)
@@ -1241,9 +1241,9 @@ 
   for (i = nregs; --i >= 0; )
     regno_src_regno[i] = -1;
 
-  /* A forward pass.  Replace output operands with input operands.  */
-  regmove_forward_pass ();
-
+  if (! flag_ira_hard_reg_pref)
+    /* A forward pass.  Replace output operands with input operands.  */
+    regmove_forward_pass ();
   /* A backward pass.  Replace input operands with output operands.  */
   regmove_backward_pass ();