diff mbox

IRA improvements 3/4

Message ID 4C3481BA.1010206@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov July 7, 2010, 1:31 p.m. UTC
The following patch speeds up IRA by decreasing number of processed
register classes for allocno.  To decrease the number, classes
specific for given allocno is used.  For example, if allocno of DIMode
and FLOAT_REGS can not hold such value, we should not try FLOAT_REGS
or if we decided that only floating point registers are profitable for
given allocno on the 1st pass, we most probably should not try general
registers on the 2nd pass.

Is the patch ok to commit the patch to the trunk?

2010-07-07  Vladimir Makarov  <vmakarov@redhat.com>

    * ira-costs.c (cost_classes, cost_classes_num): Remove.
    (struct cost_classes, cost_classes_t, const_cost_classes_t,
    regno_cost_classes, cost_classes_hash, cost_classes_eq,
    cost_classes_del, cost_classes_htab, cost_classes_mode_cache,
    initiate_regno_cost_classes, setup_cost_classes,
    setup_regno_cost_classes_by_aclass,
    setup_regno_cost_classes_by_mode, finish_regno_cost_classes): New.
    (record_reg_classes): Use regno_cost_classes instead of
    cost_classes.
    (record_address_regs, scan_one_insn): Ditto.
    (print_allocno_costs, print_pseudo_costs): Ditto.
    (find_costs_and_classes): Set up cost classes for each registers.
    Use also their mode for this.  Use regno_cost_classes instead of
    cost_classes.
    (setup_allocno_class_and_costs): Use regno_cost_classes instead of
    cost_classes.
    (free_ira_costs, ira_init_costs): Don't use cost_classes.
    (ira_costs, ira_set_pseudo_classes): Call
    initiate_regno_cost_classes and finish_regno_cost_classes.

Comments

Jeff Law July 15, 2010, 4:28 p.m. UTC | #1
On 07/07/10 07:31, Vladimir Makarov wrote:
>  The following patch speeds up IRA by decreasing number of processed
> register classes for allocno.  To decrease the number, classes
> specific for given allocno is used.  For example, if allocno of DIMode
> and FLOAT_REGS can not hold such value, we should not try FLOAT_REGS
> or if we decided that only floating point registers are profitable for
> given allocno on the 1st pass, we most probably should not try general
> registers on the 2nd pass.
>
> Is the patch ok to commit the patch to the trunk?
>
> 2010-07-07  Vladimir Makarov <vmakarov@redhat.com>
>
>    * ira-costs.c (cost_classes, cost_classes_num): Remove.
>    (struct cost_classes, cost_classes_t, const_cost_classes_t,
>    regno_cost_classes, cost_classes_hash, cost_classes_eq,
>    cost_classes_del, cost_classes_htab, cost_classes_mode_cache,
>    initiate_regno_cost_classes, setup_cost_classes,
>    setup_regno_cost_classes_by_aclass,
>    setup_regno_cost_classes_by_mode, finish_regno_cost_classes): New.
>    (record_reg_classes): Use regno_cost_classes instead of
>    cost_classes.
>    (record_address_regs, scan_one_insn): Ditto.
>    (print_allocno_costs, print_pseudo_costs): Ditto.
>    (find_costs_and_classes): Set up cost classes for each registers.
>    Use also their mode for this.  Use regno_cost_classes instead of
>    cost_classes.
>    (setup_allocno_class_and_costs): Use regno_cost_classes instead of
>    cost_classes.
>    (free_ira_costs, ira_init_costs): Don't use cost_classes.
>    (ira_costs, ira_set_pseudo_classes): Call
>    initiate_regno_cost_classes and finish_regno_cost_classes.
>
>
nit: s/memebrs/members/

> -          for (k = 0; k < cost_classes_num; k++)
> +          for (k = cost_classes_ptr->num - 1; k >= 0; k--)
Presumably you made this change to improve the performance of the loop 
(and others like it) and presumably the loop body is too complex for 
GCC's loop reversal to apply automatically and LICM couldn't remove the 
memory reference cost_classes_ptr->num from each iteration of the loop?


Nit:
> -    if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
> +    if (REG_P (ops[i]) && (regno = REGNO (ops[i])) >= 
> FIRST_PSEUDO_REGISTER)
You seem to introduce this style of code in several places.  I've never 
been a big fan as I think it makes the code harder to read and more 
prone to silly operator precedence errors (particularly when someone 
else changes things later).   If we can cleanly avoid such code, let's 
do so.  If this style is the cleanest way to express the conditional, 
then I'm not going to get terribly bent out of shape about it.

Based on your description, I'm assuming this patch is primarily designed 
to help compile-time after removal of cover classes, right?  From my 
reading, it may still be applicable to IRA in the mainline today, though 
perhaps without providing real compile-time benefts.   If so, I wouldn't 
object to this going into the mainline after separate test/bootstrap 
cycle if you wanted to reduce the divergence of your ira-improv branch.

jeff
Vladimir Makarov July 15, 2010, 6:38 p.m. UTC | #2
Jeff Law wrote:
> On 07/07/10 07:31, Vladimir Makarov wrote:
>>  The following patch speeds up IRA by decreasing number of processed
>> register classes for allocno.  To decrease the number, classes
>> specific for given allocno is used.  For example, if allocno of DIMode
>> and FLOAT_REGS can not hold such value, we should not try FLOAT_REGS
>> or if we decided that only floating point registers are profitable for
>> given allocno on the 1st pass, we most probably should not try general
>> registers on the 2nd pass.
>>
>> Is the patch ok to commit the patch to the trunk?
>>
>> 2010-07-07  Vladimir Makarov <vmakarov@redhat.com>
>>
>>    * ira-costs.c (cost_classes, cost_classes_num): Remove.
>>    (struct cost_classes, cost_classes_t, const_cost_classes_t,
>>    regno_cost_classes, cost_classes_hash, cost_classes_eq,
>>    cost_classes_del, cost_classes_htab, cost_classes_mode_cache,
>>    initiate_regno_cost_classes, setup_cost_classes,
>>    setup_regno_cost_classes_by_aclass,
>>    setup_regno_cost_classes_by_mode, finish_regno_cost_classes): New.
>>    (record_reg_classes): Use regno_cost_classes instead of
>>    cost_classes.
>>    (record_address_regs, scan_one_insn): Ditto.
>>    (print_allocno_costs, print_pseudo_costs): Ditto.
>>    (find_costs_and_classes): Set up cost classes for each registers.
>>    Use also their mode for this.  Use regno_cost_classes instead of
>>    cost_classes.
>>    (setup_allocno_class_and_costs): Use regno_cost_classes instead of
>>    cost_classes.
>>    (free_ira_costs, ira_init_costs): Don't use cost_classes.
>>    (ira_costs, ira_set_pseudo_classes): Call
>>    initiate_regno_cost_classes and finish_regno_cost_classes.
>>
>>
> nit: s/memebrs/members/
>
>> -          for (k = 0; k < cost_classes_num; k++)
>> +          for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> Presumably you made this change to improve the performance of the loop 
> (and others like it) and presumably the loop body is too complex for 
> GCC's loop reversal to apply automatically and LICM couldn't remove 
> the memory reference cost_classes_ptr->num from each iteration of the 
> loop?
>
>
Yes, usually I expect that this kind of optimizations (moving invariant 
and removing living pseudo by loop reversal by this case) are done by 
gcc.  But when I check the assembler code, it is sometimes not what I 
expected.  As I remember that was the case for this critical loop when I 
compiled by gcc4.3.

 
> Nit:
>> -    if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
>> +    if (REG_P (ops[i]) && (regno = REGNO (ops[i])) >= 
>> FIRST_PSEUDO_REGISTER)
> You seem to introduce this style of code in several places.  I've 
> never been a big fan as I think it makes the code harder to read and 
> more prone to silly operator precedence errors (particularly when 
> someone else changes things later).   If we can cleanly avoid such 
> code, let's do so.  If this style is the cleanest way to express the 
> conditional, then I'm not going to get terribly bent out of shape 
> about it.
>
In this case, it can be removed for clean code.  I'll do it.  Although 
I'd say it is quite a frequent practice in gcc.
> Based on your description, I'm assuming this patch is primarily 
> designed to help compile-time after removal of cover classes, right?  
> From my reading, it may still be applicable to IRA in the mainline 
> today, though perhaps without providing real compile-time benefts.   
> If so, I wouldn't object to this going into the mainline after 
> separate test/bootstrap cycle if you wanted to reduce the divergence 
> of your ira-improv branch.
>
Yes, right.  In general, it would be easier for me to merge the branch 
into the mainline.  I spent some time to divide the branch changes into 
several patches mainly for easier review.  The order of applying patches 
are cover.patch, compress.patch, costs.patch and finally speedup.patch.  
Only compress.patch is independent.

My plan is now to implement your and other proposals on the branch, wait 
for Bernd's patches on the trunk, merge the branch with the trunk, 
resolve the conflicts, and submitting the patches again.


Thanks for the review.  I really appreciate that you looked at these big 
patches so quickly.
Jeff Law July 15, 2010, 6:47 p.m. UTC | #3
On 07/15/10 12:38, Vladimir Makarov wrote:
>>
>>> -          for (k = 0; k < cost_classes_num; k++)
>>> +          for (k = cost_classes_ptr->num - 1; k >= 0; k--)
>> Presumably you made this change to improve the performance of the 
>> loop (and others like it) and presumably the loop body is too complex 
>> for GCC's loop reversal to apply automatically and LICM couldn't 
>> remove the memory reference cost_classes_ptr->num from each iteration 
>> of the loop?
>>
>>
>
> Yes, usually I expect that this kind of optimizations (moving 
> invariant and removing living pseudo by loop reversal by this case) 
> are done by gcc.  But when I check the assembler code, it is sometimes 
> not what I expected.  As I remember that was the case for this 
> critical loop when I compiled by gcc4.3.
Well, its usually the case (when I looked at things like this) we're 
unable to prove the load is invariant due to an aliased store, function 
call, etc.  If you can't prove the load is invariant, then the load 
obviously can't be hoisted.  That in turn prevents loop reversal, which 
is probably the less important of the two optimizations.


>
>
>> Nit:
>>> -    if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
>>> +    if (REG_P (ops[i]) && (regno = REGNO (ops[i])) >= 
>>> FIRST_PSEUDO_REGISTER)
>> You seem to introduce this style of code in several places.  I've 
>> never been a big fan as I think it makes the code harder to read and 
>> more prone to silly operator precedence errors (particularly when 
>> someone else changes things later).   If we can cleanly avoid such 
>> code, let's do so.  If this style is the cleanest way to express the 
>> conditional, then I'm not going to get terribly bent out of shape 
>> about it.
>>
> In this case, it can be removed for clean code.  I'll do it.  Although 
> I'd say it is quite a frequent practice in gcc.
Yup, it's all over the place, and in some cases it makes sense, in 
others it doesn't.  As I said, I'm not going to get too nuts about this 
since it's really a personal preference.


>> Based on your description, I'm assuming this patch is primarily 
>> designed to help compile-time after removal of cover classes, right?  
>> From my reading, it may still be applicable to IRA in the mainline 
>> today, though perhaps without providing real compile-time benefts.   
>> If so, I wouldn't object to this going into the mainline after 
>> separate test/bootstrap cycle if you wanted to reduce the divergence 
>> of your ira-improv branch.
>>
> Yes, right.  In general, it would be easier for me to merge the branch 
> into the mainline.  I spent some time to divide the branch changes 
> into several patches mainly for easier review.  The order of applying 
> patches are cover.patch, compress.patch, costs.patch and finally 
> speedup.patch.  Only compress.patch is independent.
>
> My plan is now to implement your and other proposals on the branch, 
> wait for Bernd's patches on the trunk, merge the branch with the 
> trunk, resolve the conflicts, and submitting the patches again.
Sounds good.  I'll be basically doing the same since I've got patches 
which touch on many of the same places..

jeff
Andi Kleen July 16, 2010, 8:07 a.m. UTC | #4
Vladimir Makarov <vmakarov@redhat.com> writes:
>>
> Yes, usually I expect that this kind of optimizations (moving
> invariant and removing living pseudo by loop reversal by this case)
> are done by gcc.  But when I check the assembler code, it is sometimes
> not what I expected.  As I remember that was the case for this
> critical loop when I compiled by gcc4.3.

This was fixed some time ago with PR40886. It also worked in gcc 3.x.
Should work again with gcc 4.5.

-Andi
diff mbox

Patch

--- t/ira-costs.c	2010-07-06 14:00:57.000000000 -0400
+++ ira-costs.c	2010-07-06 14:43:00.000000000 -0400
@@ -87,18 +87,6 @@  static struct costs *costs;
 /* Accumulated costs of each class for each allocno.  */
 static struct costs *total_allocno_costs;
 
-/* Classes used for cost calculation.  They may be different on
-   different iterations of the cost calculations or in different
-   optimization modes.  */
-static enum reg_class *cost_classes;
-
-/* The size of the previous array.  */
-static int cost_classes_num;
-
-/* Map: cost class -> order number (they start with 0) of the cost
-   class.  The order number is negative for non-cost classes.  */
-static int cost_class_nums[N_REG_CLASSES];
-
 /* It is the current size of struct costs.  */
 static int struct_costs_size;
 
@@ -108,8 +96,8 @@  static int struct_costs_size;
   ((struct costs *) ((char *) (arr) + (num) * struct_costs_size))
 
 /* Return index in COSTS when processing reg with REGNO.  */
-#define COST_INDEX(regno) (allocno_p 					      \
-                           ? ALLOCNO_NUM (ira_curr_regno_allocno_map[regno])  \
+#define COST_INDEX(regno) (allocno_p 					     \
+                           ? ALLOCNO_NUM (ira_curr_regno_allocno_map[regno]) \
 			   : (int) regno)
 
 /* Record register class preferences of each allocno or pseudo.  Null
@@ -132,6 +120,191 @@  static int frequency;
 
 
 
+/* Info about reg classes whose costs are calculated for a pseudo.  */
+struct cost_classes
+{
+  /* Number of the cost classes in the subsequent array.  */
+  int num;
+  /* Container of the cost classes.  */
+  enum reg_class classes[N_REG_CLASSES];
+  /* Map reg class -> index of the reg class in the previous array.
+     -1 if it is not a cost classe.  */
+  int index[N_REG_CLASSES];
+  /* Map hard regno index of first class in array CLASSES containing
+     the hard regno, -1 otherwise.  */
+  int hard_regno_index[FIRST_PSEUDO_REGISTER];
+};
+
+/* Types of pointers to the structure above.  */
+typedef struct cost_classes *cost_classes_t;
+typedef const struct cost_classes *const_cost_classes_t;
+
+/* Info about cost classes for each pseudo.  */
+static cost_classes_t *regno_cost_classes;
+
+/* Returns hash value for cost classes info V.  */
+static hashval_t
+cost_classes_hash (const void *v)
+{
+  const_cost_classes_t hv = (const_cost_classes_t) v;
+
+  return iterative_hash (&hv->classes, sizeof (enum reg_class) * hv->num, 0);
+}
+
+/* Compares cost classes info V1 and V2.  */
+static int
+cost_classes_eq (const void *v1, const void *v2)
+{
+  const_cost_classes_t hv1 = (const_cost_classes_t) v1;
+  const_cost_classes_t hv2 = (const_cost_classes_t) v2;
+
+  return hv1->num == hv2->num && memcmp (hv1->classes, hv2->classes,
+					 sizeof (enum reg_class) * hv1->num);
+}
+
+/* Delete cost classes info V from the hash table.  */
+static void
+cost_classes_del (void *v)
+{
+  ira_free (v);
+}
+
+/* Hash table of unique cost classes.  */
+static htab_t cost_classes_htab;
+
+/* Map allocno class -> cost classes for pseudo of given allocno
+   class.  */
+static cost_classes_t cost_classes_aclass_cache[N_REG_CLASSES];
+
+/* Map mode -> cost classes for pseudo of give mode.  */
+static cost_classes_t cost_classes_mode_cache[MAX_MACHINE_MODE];
+
+/* Initialize info about the cost classes for each pseudo.  */
+static void
+initiate_regno_cost_classes (void)
+{
+  int size = sizeof (cost_classes_t) * max_reg_num ();
+
+  regno_cost_classes = (cost_classes_t *) ira_allocate (size);
+  memset (regno_cost_classes, 0, size);
+  memset (cost_classes_aclass_cache, 0,
+	  sizeof (cost_classes_t) * N_REG_CLASSES);
+  memset (cost_classes_mode_cache, 0,
+	  sizeof (cost_classes_t) * MAX_MACHINE_MODE);
+  cost_classes_htab
+    = htab_create (200, cost_classes_hash, cost_classes_eq, cost_classes_del);
+}
+
+/* Create new classes from FROM and set up memebrs index and
+   hard_regno_index.  Return the new classes.  */
+static cost_classes_t
+setup_cost_classes (cost_classes_t from)
+{
+  cost_classes_t classes_ptr;
+  enum reg_class cl;
+  int i, j, hard_regno;
+
+  classes_ptr = (cost_classes_t) ira_allocate (sizeof (struct cost_classes));
+  classes_ptr->num = from->num;
+  for (i = 0; i < N_REG_CLASSES; i++)
+    classes_ptr->index[i] = -1;
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    classes_ptr->hard_regno_index[i] = -1;
+  for (i = 0; i < from->num; i++)
+    {
+      cl = classes_ptr->classes[i] = from->classes[i];
+      classes_ptr->index[cl] = i;
+      for (j = ira_class_hard_regs_num[cl] - 1; j >= 0; j--)
+	{
+	  hard_regno = ira_class_hard_regs[cl][j];
+	  if (classes_ptr->hard_regno_index[hard_regno] < 0)
+	    classes_ptr->hard_regno_index[hard_regno] = i;
+	}
+    }
+  return classes_ptr;
+}
+
+/* Setup cost classes for REGNO whose allocno class ACLASS.  */
+static void
+setup_regno_cost_classes_by_aclass (int regno, enum reg_class aclass)
+{
+  static struct cost_classes classes;
+  cost_classes_t classes_ptr;
+  enum reg_class cl;
+  int i;
+  PTR *slot;
+  HARD_REG_SET temp, temp2;
+
+  if ((classes_ptr = cost_classes_aclass_cache[aclass]) == NULL)
+    {
+      COPY_HARD_REG_SET (temp, reg_class_contents[aclass]);
+      AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs);
+      classes.num = 0;
+      for (i = 0; i < ira_important_classes_num; i++)
+	{
+	  cl = ira_important_classes[i];
+	  COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
+	  AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
+	  if (! ira_reg_pressure_class_p[cl]
+	      && hard_reg_set_subset_p (temp2, temp) && cl != aclass)
+	    continue;
+	  classes.classes[classes.num++] = cl;
+	}
+      slot = htab_find_slot (cost_classes_htab, &classes, INSERT);
+      if (*slot == NULL)
+	{
+	  classes_ptr = setup_cost_classes (&classes);
+	  *slot = classes_ptr;
+	}
+      classes_ptr = cost_classes_aclass_cache[aclass] = (cost_classes_t) *slot;
+    }
+  regno_cost_classes[regno] = classes_ptr;
+}
+
+/* Setup cost classes for REGNO with MODE.  */
+static void
+setup_regno_cost_classes_by_mode (int regno, enum machine_mode mode)
+{
+  static struct cost_classes classes;
+  cost_classes_t classes_ptr;
+  enum reg_class cl;
+  int i;
+  PTR *slot;
+  HARD_REG_SET temp;
+
+  if ((classes_ptr = cost_classes_mode_cache[mode]) == NULL)
+    {
+      classes.num = 0;
+      for (i = 0; i < ira_important_classes_num; i++)
+	{
+	  cl = ira_important_classes[i];
+	  COPY_HARD_REG_SET (temp, ira_prohibited_class_mode_regs[cl][mode]);
+	  IOR_HARD_REG_SET (temp, ira_no_alloc_regs);
+	  if (hard_reg_set_subset_p (reg_class_contents[cl], temp))
+	    continue;
+	  classes.classes[classes.num++] = cl;
+	}
+      slot = htab_find_slot (cost_classes_htab, &classes, INSERT);
+      if (*slot == NULL)
+	{
+	  classes_ptr = setup_cost_classes (&classes);
+	  *slot = classes_ptr;
+	}
+      cost_classes_mode_cache[mode] = (cost_classes_t) *slot;
+    }
+  regno_cost_classes[regno] = classes_ptr;
+}
+
+/* Finilize info about the cost classes for each pseudo.  */
+static void
+finish_regno_cost_classes (void)
+{
+  ira_free (regno_cost_classes);
+  htab_delete (cost_classes_htab);
+}
+
+
+
 /* Compute the cost of loading X into (if TO_P is TRUE) or from (if
    TO_P is FALSE) a register of class RCLASS in mode MODE.  X must not
    be a pseudo register.  */
@@ -213,7 +386,7 @@  record_reg_classes (int n_alts, int n_op
 		    enum reg_class *pref)
 {
   int alt;
-  int i, j, k;
+  int i, j, k, regno, other_regno;
   rtx set;
   int insn_allows_mem[MAX_RECOG_OPERANDS];
 
@@ -296,7 +469,7 @@  record_reg_classes (int n_alts, int n_op
 		    }
 		}
 	      else if (! REG_P (ops[j])
-		       || REGNO (ops[j]) < FIRST_PSEUDO_REGISTER)
+		       || (regno = REGNO (ops[j])) < FIRST_PSEUDO_REGISTER)
 		{
 		  /* This op is an allocno but the one it matches is
 		     not.  */
@@ -319,8 +492,10 @@  record_reg_classes (int n_alts, int n_op
 		     Moreover, if we cannot tie them, this alternative
 		     needs to do a copy, which is one insn.  */
 		  struct costs *pp = this_op_costs[i];
+		  cost_classes_t cost_classes_ptr = regno_cost_classes[REGNO (op)];
+		  enum reg_class *cost_classes = cost_classes_ptr->classes;
 
-		  for (k = 0; k < cost_classes_num; k++)
+		  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 		    {
 		      rclass = cost_classes[k];
 		      pp->cost[k]
@@ -328,8 +503,8 @@  record_reg_classes (int n_alts, int n_op
 			     ? ira_get_may_move_cost (mode, rclass,
 						      classes[i], true) : 0)
 			    + (recog_data.operand_type[i] != OP_IN
-			       ? ira_get_may_move_cost (mode, classes[i],
-							rclass, false) : 0))
+			       ? ira_get_may_move_cost (mode, rclass,
+							classes[i], false) : 0))
 			   * frequency);
 		    }
 
@@ -544,7 +719,7 @@  record_reg_classes (int n_alts, int n_op
 	     allocated for register preferencing.  If some register
 	     class is valid, compute the costs of moving the allocno
 	     into that class.  */
-	  if (REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER)
+	  if (REG_P (op) && (regno = REGNO (op)) >= FIRST_PSEUDO_REGISTER)
 	    {
 	      if (classes[i] == NO_REGS)
 		{
@@ -559,8 +734,10 @@  record_reg_classes (int n_alts, int n_op
 	      else
 		{
 		  struct costs *pp = this_op_costs[i];
+		  cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
+		  enum reg_class *cost_classes = cost_classes_ptr->classes;
 
-		  for (k = 0; k < cost_classes_num; k++)
+		  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 		    {
 		      rclass = cost_classes[k];
 		      pp->cost[k]
@@ -642,15 +819,16 @@  record_reg_classes (int n_alts, int n_op
       /* Finally, update the costs with the information we've
 	 calculated about this alternative.  */
       for (i = 0; i < n_ops; i++)
-	if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
+	if (REG_P (ops[i]) && (regno = REGNO (ops[i])) >= FIRST_PSEUDO_REGISTER)
 	  {
 	    struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	    int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
+	    cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
 
 	    pp->mem_cost = MIN (pp->mem_cost,
 				(qq->mem_cost + op_cost_add) * scale);
 
-	    for (k = 0; k < cost_classes_num; k++)
+	    for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 	      pp->cost[k]
 		= MIN (pp->cost[k], (qq->cost[k] + op_cost_add) * scale);
 	  }
@@ -688,37 +866,38 @@  record_reg_classes (int n_alts, int n_op
       && REG_P (ops[0]) && REG_P (ops[1])
       && find_regno_note (insn, REG_DEAD, REGNO (ops[1])))
     for (i = 0; i <= 1; i++)
-      if (REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
+      if ((regno = REGNO (ops[i])) >= FIRST_PSEUDO_REGISTER
+	  && (other_regno = REGNO (ops[!i])) < FIRST_PSEUDO_REGISTER)
 	{
-	  unsigned int regno = REGNO (ops[!i]);
 	  enum machine_mode mode = GET_MODE (ops[!i]);
+	  cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
+	  enum reg_class *cost_classes = cost_classes_ptr->classes;
 	  enum reg_class rclass;
-	  unsigned int nr;
+	  int nr;
 
-	  if (regno < FIRST_PSEUDO_REGISTER)
-	    for (k = 0; k < cost_classes_num; k++)
-	      {
-		rclass = cost_classes[k];
-		if (TEST_HARD_REG_BIT (reg_class_contents[rclass], regno)
-		    && (reg_class_size[rclass]
-			== (unsigned) CLASS_MAX_NREGS (rclass, mode)))
-		  {
-		    if (reg_class_size[rclass] == 1)
-		      op_costs[i]->cost[k] = -frequency;
-		    else
-		      {
-			for (nr = 0;
-			     nr < (unsigned) hard_regno_nregs[regno][mode];
-			     nr++)
-			  if (! TEST_HARD_REG_BIT (reg_class_contents[rclass],
-						   regno + nr))
-			    break;
-
-			if (nr == (unsigned) hard_regno_nregs[regno][mode])
-			  op_costs[i]->cost[k] = -frequency;
-		      }
-		  }
-	      }
+	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+	    {
+	      rclass = cost_classes[k];
+	      if (TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno)
+		  && (reg_class_size[rclass]
+		      == (unsigned) CLASS_MAX_NREGS (rclass, mode)))
+		{
+		  if (reg_class_size[rclass] == 1)
+		    op_costs[i]->cost[k] = -frequency;
+		  else
+		    {
+		      for (nr = 0;
+			   nr < hard_regno_nregs[other_regno][mode];
+			   nr++)
+			if (! TEST_HARD_REG_BIT (reg_class_contents[rclass],
+						 other_regno + nr))
+			  break;
+		      
+		      if (nr == hard_regno_nregs[other_regno][mode])
+			op_costs[i]->cost[k] = -frequency;
+		    }
+		}
+	    }
 	}
 }
 
@@ -906,20 +1085,24 @@  record_address_regs (enum machine_mode m
       {
 	struct costs *pp;
 	enum reg_class i;
-	int k, add_cost;
+	int k, regno, add_cost;
+	cost_classes_t cost_classes_ptr;
+	enum reg_class *cost_classes;
 
-	if (REGNO (x) < FIRST_PSEUDO_REGISTER)
+	if ((regno = REGNO (x)) < FIRST_PSEUDO_REGISTER)
 	  break;
 
 	if (allocno_p)
-	  ALLOCNO_BAD_SPILL_P (ira_curr_regno_allocno_map[REGNO (x)]) = true;
-	pp = COSTS (costs, COST_INDEX (REGNO (x)));
+	  ALLOCNO_BAD_SPILL_P (ira_curr_regno_allocno_map[regno]) = true;
+	pp = COSTS (costs, COST_INDEX (regno));
 	add_cost = (ira_memory_move_cost[Pmode][rclass][1] * scale) / 2;
 	if (INT_MAX - add_cost < pp->mem_cost)
 	  pp->mem_cost = INT_MAX;
 	else
 	  pp->mem_cost += add_cost;
-	for (k = 0; k < cost_classes_num; k++)
+	cost_classes_ptr = regno_cost_classes[regno];
+	cost_classes = cost_classes_ptr->classes;
+	for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 	  {
 	    i = cost_classes[k];
 	    add_cost = (ira_get_may_move_cost (Pmode, i, rclass, true) * scale) / 2;
@@ -981,7 +1164,7 @@  record_operand_costs (rtx insn, struct c
 	record_address_regs (VOIDmode, recog_data.operand[i], 0, ADDRESS,
 			     SCRATCH, frequency * 2);
     }
-
+  
   /* Check for commutative in a separate loop so everything will have
      been initialized.  We must do this even if one operand is a
      constant--see addsi3 in m68k.md.  */
@@ -1017,7 +1200,7 @@  scan_one_insn (rtx insn)
 {
   enum rtx_code pat_code;
   rtx set, note;
-  int i, k;
+  int i, k, regno;
 
   if (!NONDEBUG_INSN_P (insn))
     return insn;
@@ -1041,8 +1224,6 @@  scan_one_insn (rtx insn)
       rtx reg = SET_DEST (set);
       int num = COST_INDEX (REGNO (reg));
 
-      if (pref)
-	cl = pref[num];
       COSTS (costs, num)->mem_cost
 	-= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency;
       record_address_regs (GET_MODE (SET_SRC (set)), XEXP (SET_SRC (set), 0),
@@ -1055,11 +1236,11 @@  scan_one_insn (rtx insn)
      allocno.  */
   for (i = 0; i < recog_data.n_operands; i++)
     if (REG_P (recog_data.operand[i])
-	&& REGNO (recog_data.operand[i]) >= FIRST_PSEUDO_REGISTER)
+	&& (regno = REGNO (recog_data.operand[i])) >= FIRST_PSEUDO_REGISTER)
       {
-	int regno = REGNO (recog_data.operand[i]);
 	struct costs *p = COSTS (costs, COST_INDEX (regno));
 	struct costs *q = op_costs[i];
+	cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
 	int add_cost;
 
 	add_cost = q->mem_cost;
@@ -1067,7 +1248,7 @@  scan_one_insn (rtx insn)
 	  p->mem_cost = INT_MAX;
 	else
 	  p->mem_cost += add_cost;
-	for (k = 0; k < cost_classes_num; k++)
+	for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 	  {
 	    add_cost = q->cost[k];
 	    if (add_cost > 0 && INT_MAX - add_cost < p->cost[k])
@@ -1097,6 +1278,8 @@  print_allocno_costs (FILE *f)
       int i, rclass;
       basic_block bb;
       int regno = ALLOCNO_REGNO (a);
+      cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
+      enum reg_class *cost_classes = cost_classes_ptr->classes;
 
       i = ALLOCNO_NUM (a);
       fprintf (f, "  a%d(r%d,", i, regno);
@@ -1105,7 +1288,7 @@  print_allocno_costs (FILE *f)
       else
 	fprintf (f, "l%d", ALLOCNO_LOOP_TREE_NODE (a)->loop->num);
       fprintf (f, ") costs:");
-      for (k = 0; k < cost_classes_num; k++)
+      for (k = 0; k < cost_classes_ptr->num; k++)
 	{
 	  rclass = cost_classes[k];
 	  if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
@@ -1139,6 +1322,8 @@  print_pseudo_costs (FILE *f)
 {
   int regno, k;
   int rclass;
+  cost_classes_t cost_classes_ptr;
+  enum reg_class *cost_classes;
 
   ira_assert (! allocno_p);
   fprintf (f, "\n");
@@ -1146,8 +1331,10 @@  print_pseudo_costs (FILE *f)
     {
       if (REG_N_REFS (regno) <= 0)
 	continue;
+      cost_classes_ptr = regno_cost_classes[regno];
+      cost_classes = cost_classes_ptr->classes;
       fprintf (f, "  r%d costs:", regno);
-      for (k = 0; k < cost_classes_num; k++)
+      for (k = 0; k < cost_classes_ptr->num; k++)
 	{
 	  rclass = cost_classes[k];
 	  if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
@@ -1198,26 +1385,49 @@  process_bb_node_for_costs (ira_loop_tree
 static void
 find_costs_and_classes (FILE *dump_file)
 {
-  int i, k, start;
+  int i, k, start, max_cost_classes_num;
   int pass;
   basic_block bb;
+  enum reg_class *regno_best_class;
 
   init_recog ();
 #ifdef FORBIDDEN_INC_DEC_CLASSES
   in_inc_dec = ira_allocate (sizeof (bool) * cost_elements_num);
 #endif /* FORBIDDEN_INC_DEC_CLASSES */
-  pref = NULL;
-  start = 0;
-  if (!resize_reg_info () && allocno_p && pseudo_classes_defined_p)
+  regno_best_class
+    = (enum reg_class *) ira_allocate (max_reg_num ()
+				       * sizeof (enum reg_class));
+  for (i = max_reg_num () - 1; i >= FIRST_PSEUDO_REGISTER; i--)
+    regno_best_class[i] = NO_REGS;
+  if (!resize_reg_info () && allocno_p
+      && pseudo_classes_defined_p && flag_expensive_optimizations)
     {
       ira_allocno_t a;
       ira_allocno_iterator ai;
 
       pref = pref_buffer;
+      max_cost_classes_num = 1;
       FOR_EACH_ALLOCNO (a, ai)
-	pref[ALLOCNO_NUM (a)] = reg_preferred_class (ALLOCNO_REGNO (a));
-      if (flag_expensive_optimizations)
-	start = 1;
+	{
+	  pref[ALLOCNO_NUM (a)] = reg_preferred_class (ALLOCNO_REGNO (a));
+	  setup_regno_cost_classes_by_aclass
+	    (ALLOCNO_REGNO (a), pref[ALLOCNO_NUM (a)]);
+	  max_cost_classes_num
+	    = MAX (max_cost_classes_num,
+		   regno_cost_classes[ALLOCNO_REGNO (a)]->num);
+	}
+      start = 1;
+    }
+  else
+    {
+      pref = NULL;
+      max_cost_classes_num = ira_important_classes_num;
+      for (i = max_reg_num () - 1; i >= FIRST_PSEUDO_REGISTER; i--)
+	if (regno_reg_rtx[i] != NULL_RTX)
+ 	  setup_regno_cost_classes_by_mode (i, PSEUDO_REGNO_MODE (i));
+	else
+	  setup_regno_cost_classes_by_aclass (i, ALL_REGS);
+      start = 0;
     }
   if (allocno_p)
     /* Clear the flag for the next compiled function.  */
@@ -1231,19 +1441,20 @@  find_costs_and_classes (FILE *dump_file)
       if ((!allocno_p || internal_flag_ira_verbose > 0) && dump_file)
 	fprintf (dump_file,
 		 "\nPass %i for finding pseudo/allocno costs\n\n", pass);
-      for (i = 0; i < N_REG_CLASSES; i++)
-	cost_class_nums[i] = -1;
-      for (cost_classes_num = 0;
-	   cost_classes_num < ira_important_classes_num;
-	   cost_classes_num++)
+
+      if (pass != start)
 	{
-	  cost_classes[cost_classes_num]
-	    = ira_important_classes[cost_classes_num];
-	  cost_class_nums[cost_classes[cost_classes_num]]
-	    = cost_classes_num;
+	  max_cost_classes_num = 1;
+	  for (i = max_reg_num () - 1; i >= FIRST_PSEUDO_REGISTER; i--)
+	    {
+	      setup_regno_cost_classes_by_aclass (i, regno_best_class[i]);
+	      max_cost_classes_num
+		= MAX (max_cost_classes_num, regno_cost_classes[i]->num);
+	    }
 	}
+
       struct_costs_size
-	= sizeof (struct costs) + sizeof (int) * (cost_classes_num - 1);
+	= sizeof (struct costs) + sizeof (int) * (max_cost_classes_num - 1);
       /* Zero out our accumulation of the cost of each class for each
 	 allocno.  */
       memset (costs, 0, cost_elements_num * struct_costs_size);
@@ -1284,6 +1495,8 @@  find_costs_and_classes (FILE *dump_file)
 #ifdef FORBIDDEN_INC_DEC_CLASSES
 	  int inc_dec_p = false;
 #endif
+	  cost_classes_t cost_classes_ptr = regno_cost_classes[i];
+	  enum reg_class *cost_classes = cost_classes_ptr->classes;
 	  int equiv_savings = regno_equiv_gains[i];
 
 	  if (! allocno_p)
@@ -1318,17 +1531,14 @@  find_costs_and_classes (FILE *dump_file)
 		      /* Propagate costs to upper levels in the region
 			 tree.  */
 		      parent_a_num = ALLOCNO_NUM (parent_a);
-		      for (k = 0; k < cost_classes_num; k++)
+		      for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 			{
-			  add_cost
-			    = COSTS (total_allocno_costs, a_num)->cost[k];
+			  add_cost = COSTS (total_allocno_costs, a_num)->cost[k];
 			  if (add_cost > 0 && INT_MAX - add_cost
 			      < COSTS (total_allocno_costs, parent_a_num)->cost[k])
-			    COSTS (total_allocno_costs, parent_a_num)->cost[k]
-			      = INT_MAX;
+			    COSTS (total_allocno_costs, parent_a_num)->cost[k] = INT_MAX;
 			  else
-			    COSTS (total_allocno_costs, parent_a_num)->cost[k]
-			      += add_cost;
+			    COSTS (total_allocno_costs, parent_a_num)->cost[k] += add_cost;
 			}
 		      add_cost = COSTS (total_allocno_costs, a_num)->mem_cost;
 		      if (add_cost > 0
@@ -1340,10 +1550,11 @@  find_costs_and_classes (FILE *dump_file)
 		      else
 			COSTS (total_allocno_costs, parent_a_num)->mem_cost
 			  += add_cost;
+
 		    }
-		  for (k = 0; k < cost_classes_num; k++)
+		  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 		    {
-		      add_cost = COSTS (costs, a_num)->cost[k];
+		      add_cost = COSTS (total_allocno_costs, a_num)->cost[k];
 		      if (add_cost > 0 && INT_MAX - add_cost < temp_costs->cost[k])
 			temp_costs->cost[k] = INT_MAX;
 		      else
@@ -1365,7 +1576,7 @@  find_costs_and_classes (FILE *dump_file)
 	  else if (equiv_savings > 0)
 	    {
 	      temp_costs->mem_cost = 0;
-	      for (k = 0; k < cost_classes_num; k++)
+	      for (k = cost_classes_ptr->num - 1; k >= 0; k--)
 		temp_costs->cost[k] += equiv_savings;
 	    }
 
@@ -1374,7 +1585,7 @@  find_costs_and_classes (FILE *dump_file)
 	  alt_class = NO_REGS;
 	  /* Find best common class for all allocnos with the same
 	     regno.  */
-	  for (k = 0; k < cost_classes_num; k++)
+	  for (k = 0; k < cost_classes_ptr->num; k++)
 	    {
 	      rclass = cost_classes[k];
 	      /* Ignore classes that are too small for this operand or
@@ -1428,6 +1639,7 @@  find_costs_and_classes (FILE *dump_file)
 			 i, reg_class_names[best], reg_class_names[alt_class],
 			 reg_class_names[regno_aclass[i]]);
 	    }
+	  regno_best_class[i] = best;
 	  if (! allocno_p)
 	    {
 	      pref[i] = best_cost > temp_costs->mem_cost ? NO_REGS : best;
@@ -1447,7 +1659,7 @@  find_costs_and_classes (FILE *dump_file)
 		  best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
 		  allocno_cost = best_cost;
 		  best = ALL_REGS;
-		  for (k = 0; k < cost_classes_num; k++)
+		  for (k = 0; k < cost_classes_ptr->num; k++)
 		    {
 		      rclass = cost_classes[k];
 		      if (! ira_class_subset_p[rclass][regno_aclass[i]])
@@ -1465,20 +1677,16 @@  find_costs_and_classes (FILE *dump_file)
 #endif
 			  )
 			;
-		      else if (COSTS (total_allocno_costs, a_num)->cost[k]
-			       < best_cost)
+		      else if (COSTS (total_allocno_costs, a_num)->cost[k] < best_cost)
 			{
-			  best_cost
-			    = COSTS (total_allocno_costs, a_num)->cost[k];
+			  best_cost = COSTS (total_allocno_costs, a_num)->cost[k];
 			  allocno_cost = COSTS (costs, a_num)->cost[k];
 			  best = (enum reg_class) rclass;
 			}
-		      else if (COSTS (total_allocno_costs, a_num)->cost[k]
-			       == best_cost)
+		      else if (COSTS (total_allocno_costs, a_num)->cost[k] == best_cost)
 			{
 			  best = ira_reg_class_subunion[best][rclass];
-			  allocno_cost
-			    = MAX (allocno_cost, COSTS (costs, a_num)->cost[k]);
+			  allocno_cost = MAX (allocno_cost, COSTS (costs, a_num)->cost[k]);
 			}
 		    }
 		  ALLOCNO_CLASS_COST (a) = allocno_cost;
@@ -1499,7 +1707,7 @@  find_costs_and_classes (FILE *dump_file)
 	      pref[a_num] = best;
 	    }
 	}
-
+      
       if (internal_flag_ira_verbose > 4 && dump_file)
 	{
 	  if (allocno_p)
@@ -1509,6 +1717,7 @@  find_costs_and_classes (FILE *dump_file)
 	  fprintf (dump_file,"\n");
 	}
     }
+  ira_free (regno_best_class);
 #ifdef FORBIDDEN_INC_DEC_CLASSES
   ira_free (in_inc_dec);
 #endif
@@ -1585,7 +1794,7 @@  process_bb_node_for_hard_reg_moves (ira_
       ALLOCNO_HARD_REG_COSTS (a)[i] -= cost;
       ALLOCNO_CONFLICT_HARD_REG_COSTS (a)[i] -= cost;
       ALLOCNO_CLASS_COST (a) = MIN (ALLOCNO_CLASS_COST (a),
-				    ALLOCNO_HARD_REG_COSTS (a)[i]);
+					  ALLOCNO_HARD_REG_COSTS (a)[i]);
     }
 }
 
@@ -1595,17 +1804,20 @@  process_bb_node_for_hard_reg_moves (ira_
 static void
 setup_allocno_class_and_costs (void)
 {
-  int i, j, n, hard_regno, num;
+  int i, j, n, regno, hard_regno, num;
   int *reg_costs;
   enum reg_class aclass, rclass;
   ira_allocno_t a;
   ira_allocno_iterator ai;
+  cost_classes_t cost_classes_ptr;
 
   ira_assert (allocno_p);
   FOR_EACH_ALLOCNO (a, ai)
     {
       i = ALLOCNO_NUM (a);
-      aclass = regno_aclass[ALLOCNO_REGNO (a)];
+      regno = ALLOCNO_REGNO (a);
+      aclass = regno_aclass[regno];
+      cost_classes_ptr = regno_cost_classes[regno];
       ira_assert (pref[i] == NO_REGS || aclass != NO_REGS);
       ALLOCNO_MEMORY_COST (a) = COSTS (costs, i)->mem_cost;
       ira_set_allocno_class (a, aclass);
@@ -1624,10 +1836,10 @@  setup_allocno_class_and_costs (void)
 	      else
 		{
 		  rclass = REGNO_REG_CLASS (hard_regno);
-		  num = cost_class_nums[rclass];
+		  num = cost_classes_ptr->index[rclass];
 		  if (num < 0)
 		    {
-		      num = cost_class_nums[aclass];
+		      num = cost_classes_ptr->hard_regno_index[hard_regno];
 		      ira_assert (num >= 0);
 		    }
 		  reg_costs[j] = COSTS (costs, i)->cost[num];
@@ -1655,7 +1867,6 @@  ira_init_costs_once (void)
       this_op_costs[i] = NULL;
     }
   temp_costs = NULL;
-  cost_classes = NULL;
 }
 
 /* Free allocated temporary cost vectors.  */
@@ -1678,9 +1889,6 @@  free_ira_costs (void)
   if (temp_costs != NULL)
     free (temp_costs);
   temp_costs = NULL;
-  if (cost_classes != NULL)
-    free (cost_classes);
-  cost_classes = NULL;
 }
 
 /* This is called each time register related information is
@@ -1705,8 +1913,6 @@  ira_init_costs (void)
       this_op_costs[i] = (struct costs *) xmalloc (max_struct_costs_size);
     }
   temp_costs = (struct costs *) xmalloc (max_struct_costs_size);
-  cost_classes = (enum reg_class *) xmalloc (sizeof (enum reg_class)
-					     * ira_important_classes_num);
 }
 
 /* Function called once at the end of compiler work.  */
@@ -1729,7 +1935,7 @@  init_costs (void)
   pref_buffer = (enum reg_class *) ira_allocate (sizeof (enum reg_class)
 						 * cost_elements_num);
   regno_aclass = (enum reg_class *) ira_allocate (sizeof (enum reg_class)
-						  * max_reg_num ());
+						 * max_reg_num ());
   regno_equiv_gains = (int *) ira_allocate (sizeof (int) * max_reg_num ());
   memset (regno_equiv_gains, 0, sizeof (int) * max_reg_num ());
 }
@@ -1739,8 +1945,8 @@  init_costs (void)
 static void
 finish_costs (void)
 {
-  ira_free (regno_equiv_gains);
   ira_free (regno_aclass);
+  ira_free (regno_equiv_gains);
   ira_free (pref_buffer);
   ira_free (costs);
 }
@@ -1755,9 +1961,11 @@  ira_costs (void)
   init_costs ();
   total_allocno_costs = (struct costs *) ira_allocate (max_struct_costs_size
 						       * ira_allocnos_num);
+  initiate_regno_cost_classes ();
   calculate_elim_costs_all_insns ();
   find_costs_and_classes (ira_dump_file);
   setup_allocno_class_and_costs ();
+  finish_regno_cost_classes ();
   finish_costs ();
   ira_free (total_allocno_costs);
 }
@@ -1770,7 +1978,9 @@  ira_set_pseudo_classes (FILE *dump_file)
   internal_flag_ira_verbose = flag_ira_verbose;
   cost_elements_num = max_reg_num ();
   init_costs ();
+  initiate_regno_cost_classes ();
   find_costs_and_classes (dump_file);
+  finish_regno_cost_classes ();
   pseudo_classes_defined_p = true;
   finish_costs ();
 }