Patchwork Change IVOPTS and strength reduction to use expmed cost model

login
register
mail settings
Submitter William J. Schmidt
Date July 25, 2012, 7:40 p.m.
Message ID <1343245213.4638.21.camel@oc2474580526.ibm.com>
Download mbox | patch
Permalink /patch/173248/
State New
Headers show

Comments

William J. Schmidt - July 25, 2012, 7:40 p.m.
On Wed, 2012-07-25 at 09:59 -0700, Richard Henderson wrote:
> On 07/25/2012 09:13 AM, William J. Schmidt wrote:
> > Per Richard Henderson's suggestion
> > (http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01370.html), this patch
> > changes the IVOPTS and straight-line strength reduction passes to make
> > use of data computed by init_expmed.  This required adding a new
> > convert_cost array in expmed to store the costs of converting between
> > various scalar integer modes, and exposing expmed's multiplication hash
> > table for external use (new function mult_by_coeff_cost).  Richard H,
> > I'd appreciate it if you could look at what I did there and make sure
> > it's correct.  Thanks!
> 
> Correctness looks good.
> 
> > I decided it wasn't worth distinguishing between reg-reg add costs and
> > reg-constant add costs, so I simplified the strength reduction
> > calculations rather than adding another array to expmed for this
> > purpose.  But I can make this distinction if that's preferable.
> 
> I don't think this is worth thinking about at this level.  This is
> something that some rtl-level optimization ought to be able to fix
> up trivially, e.g. cse.
> 
> > Index: gcc/expmed.h
> > ===================================================================
> > --- gcc/expmed.h	(revision 189845)
> > +++ gcc/expmed.h	(working copy)
> > @@ -155,6 +155,11 @@ struct target_expmed {
> >    int x_udiv_cost[2][NUM_MACHINE_MODES];
> >    int x_mul_widen_cost[2][NUM_MACHINE_MODES];
> >    int x_mul_highpart_cost[2][NUM_MACHINE_MODES];
> > +
> > +  /* Conversion costs are only defined between two scalar integer modes
> > +     of different sizes.  The first machine mode is the destination mode,
> > +     and the second is the source mode.  */
> > +  int x_convert_cost[2][NUM_MACHINE_MODES][NUM_MACHINE_MODES];
> >  };
> 
> 2 * NUM_MACHINE_MODES is quite large...  I think we could do better with
> 
> #define NUM_MODE_INT (MAX_MODE_INT - MIN_MODE_INT + 1)
> 
>   x_convert_cost[2][NUM_MODE_INT][NUM_MODE_INT];
> 
> though really that could be done with all of these fields all at once.
> 
> That does suggest it would be better to leave at least inline functions
> to access these elements, rather than open code the array access.
> 
> 
> r~
> 

Thanks for the quick review!  Excellent point about the array size.  The
attached revised patch follows your suggestion to limit the size.

I only did this for the new field, as changing all the existing
accessors to inline functions is more effort than I have time for right
now.  This is left as an exercise for the reader. ;)

Bootstrapped and tested on powepc64-unknown-linux-gnu with no new
failures.  Is this ok?

Thanks,
Bill


2012-07-25  Bill Schmidt  <wschmidt@linux.ibm.com>

	* tree-ssa-loop-ivopts.c (mbc_entry_hash): Remove.
	(mbc_entry_eq): Likewise.
	(mult_costs): Likewise.
	(cost_tables_exist): Likewise.
	(initialize_costs): Likewise.
	(finalize_costs): Likewise.
	(tree_ssa_iv_optimize_init): Remove call to initialize_costs.
	(add_regs_cost): Remove.
	(multiply_regs_cost): Likewise.
	(add_const_cost): Likewise.
	(extend_or_trunc_reg_cost): Likewise.
	(negate_reg_cost): Likewise.
	(struct mbc_entry): Likewise.
	(multiply_by_const_cost): Likewise.
	(get_address_cost): Change add_regs_cost calls to add_cost lookups;
	change multiply_by_const_cost to mult_by_coeff_cost.
	(force_expr_to_var_cost): Likewise.
	(difference_cost): Change multiply_by_const_cost to mult_by_coeff_cost.
	(get_computation_cost_at): Change add_regs_cost calls to add_cost
	lookups; change multiply_by_const_cost to mult_by_coeff_cost.
	(determine_iv_cost): Change add_regs_cost calls to add_cost lookups.
	(tree_ssa_iv_optimize_finalize): Remove call to finalize_costs.
	* tree-ssa-address.c (expmed.h): New #include.
	(most_expensive_mult_to_index): Change multiply_by_const_cost to
	mult_by_coeff_cost.
	* gimple-ssa-strength-reduction.c (expmed.h): New #include.
	(stmt_cost): Change to use mult_by_coeff_cost, mul_cost, add_cost,
	neg_cost, and convert_cost instead of IVOPTS interfaces.
	(execute_strength_reduction): Remove calls to initialize_costs and
	finalize_costs.
	* expmed.c (struct init_expmed_rtl): Add convert rtx_def.
	(init_expmed_one_mode): Initialize convert rtx_def; initialize
	x_convert_cost for related modes.
	(mult_by_coeff_cost): New function.
	* expmed.h (NUM_MODE_INT): New #define.
	(struct target_expmed): Add x_convert_cost matrix.
	(set_convert_cost): New inline function.
	(convert_cost): Likewise.
	(mult_by_coeff_cost): New extern decl.
	* tree-flow.h (initialize_costs): Remove decl.
	(finalize_costs): Likewise.
	(multiply_by_const_cost): Likewise.
	(add_regs_cost): Likewise.
	(multiply_regs_cost): Likewise.
	(add_const_cost): Likewise.
	(extend_or_trunc_reg_cost): Likewise.
	(negate_reg_cost): Likewise.
Richard Henderson - July 25, 2012, 8:34 p.m.
On 07/25/2012 12:40 PM, William J. Schmidt wrote:
> Thanks for the quick review!  Excellent point about the array size.  The
> attached revised patch follows your suggestion to limit the size.
> 
> I only did this for the new field, as changing all the existing
> accessors to inline functions is more effort than I have time for right
> now.  This is left as an exercise for the reader. ;)

Sure.  ;-)

> Bootstrapped and tested on powepc64-unknown-linux-gnu with no new
> failures.  Is this ok?

Ok.


r~

Patch

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 189845)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -88,9 +88,6 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "expmed.h"
 
-static hashval_t mbc_entry_hash (const void *);
-static int mbc_entry_eq (const void*, const void *);
-
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
    interface between the GIMPLE and RTL worlds.  */
@@ -381,11 +378,6 @@  struct iv_ca_delta
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
-/* Cached costs for multiplies by constants, and a flag to indicate
-   when they're valid.  */
-static htab_t mult_costs[2];
-static bool cost_tables_exist = false;
-
 static comp_cost force_expr_to_var_cost (tree, bool);
 
 /* Number of uses recorded in DATA.  */
@@ -851,26 +843,6 @@  htab_inv_expr_hash (const void *ent)
   return expr->hash;
 }
 
-/* Allocate data structures for the cost model.  */
-
-void
-initialize_costs (void)
-{
-  mult_costs[0] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
-  mult_costs[1] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
-  cost_tables_exist = true;
-}
-
-/* Release data structures for the cost model.  */
-
-void
-finalize_costs (void)
-{
-  cost_tables_exist = false;
-  htab_delete (mult_costs[0]);
-  htab_delete (mult_costs[1]);
-}
-
 /* Initializes data structures used by the iv optimization pass, stored
    in DATA.  */
 
@@ -889,8 +861,6 @@  tree_ssa_iv_optimize_init (struct ivopts_data *dat
                                     htab_inv_expr_eq, free);
   data->inv_expr_id = 0;
   decl_rtl_to_reset = VEC_alloc (tree, heap, 20);
-
-  initialize_costs ();
 }
 
 /* Returns a memory object to that EXPR points.  In case we are able to
@@ -3077,250 +3047,6 @@  adjust_setup_cost (struct ivopts_data *data, unsig
     return cost;
 }
 
-/* Returns cost of addition in MODE.  */
-
-unsigned
-add_regs_cost (enum machine_mode mode, bool speed)
-{
-  static unsigned costs[NUM_MACHINE_MODES][2];
-  rtx seq;
-  unsigned cost;
-
-  if (costs[mode][speed])
-    return costs[mode][speed];
-
-  start_sequence ();
-  force_operand (gen_rtx_fmt_ee (PLUS, mode,
-				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
-				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
-		 NULL_RTX);
-  seq = get_insns ();
-  end_sequence ();
-
-  cost = seq_cost (seq, speed);
-  if (!cost)
-    cost = 1;
-
-  costs[mode][speed] = cost;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Addition in %s costs %d\n",
-	     GET_MODE_NAME (mode), cost);
-  return cost;
-}
-
-/* Returns cost of multiplication in MODE.  */
-
-unsigned
-multiply_regs_cost (enum machine_mode mode, bool speed)
-{
-  static unsigned costs[NUM_MACHINE_MODES][2];
-  rtx seq;
-  unsigned cost;
-
-  if (costs[mode][speed])
-    return costs[mode][speed];
-
-  start_sequence ();
-  force_operand (gen_rtx_fmt_ee (MULT, mode,
-				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
-				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
-		 NULL_RTX);
-  seq = get_insns ();
-  end_sequence ();
-
-  cost = seq_cost (seq, speed);
-  if (!cost)
-    cost = 1;
-
-  costs[mode][speed] = cost;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Multiplication in %s costs %d\n",
-	     GET_MODE_NAME (mode), cost);
-  return cost;
-}
-
-/* Returns cost of addition with a constant in MODE.  */
-
-unsigned
-add_const_cost (enum machine_mode mode, bool speed)
-{
-  static unsigned costs[NUM_MACHINE_MODES][2];
-  rtx seq;
-  unsigned cost;
-
-  if (costs[mode][speed])
-    return costs[mode][speed];
-
-  /* Arbitrarily generate insns for x + 2, as the exact constant
-     shouldn't matter.  */
-  start_sequence ();
-  force_operand (gen_rtx_fmt_ee (PLUS, mode,
-				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
-				 gen_int_mode (2, mode)),
-		 NULL_RTX);
-  seq = get_insns ();
-  end_sequence ();
-
-  cost = seq_cost (seq, speed);
-  if (!cost)
-    cost = 1;
-
-  costs[mode][speed] = cost;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Addition to constant in %s costs %d\n",
-	     GET_MODE_NAME (mode), cost);
-  return cost;
-}
-
-/* Returns cost of extend or truncate in MODE.  */
-
-unsigned
-extend_or_trunc_reg_cost (tree type_to, tree type_from, bool speed)
-{
-  static unsigned costs[NUM_MACHINE_MODES][NUM_MACHINE_MODES][2];
-  rtx seq;
-  unsigned cost;
-  enum machine_mode mode_to = TYPE_MODE (type_to);
-  enum machine_mode mode_from = TYPE_MODE (type_from);
-  tree size_to = TYPE_SIZE (type_to);
-  tree size_from = TYPE_SIZE (type_from);
-  enum rtx_code code;
-
-  gcc_assert (TREE_CODE (size_to) == INTEGER_CST
-	      && TREE_CODE (size_from) == INTEGER_CST);
-
-  if (costs[mode_to][mode_from][speed])
-    return costs[mode_to][mode_from][speed];
-
-  if (tree_int_cst_lt (size_to, size_from))
-    code = TRUNCATE;
-  else if (TYPE_UNSIGNED (type_to))
-    code = ZERO_EXTEND;
-  else
-    code = SIGN_EXTEND;
-
-  start_sequence ();
-  gen_rtx_fmt_e (code, mode_to,
-		 gen_raw_REG (mode_from, LAST_VIRTUAL_REGISTER + 1));
-  seq = get_insns ();
-  end_sequence ();
-
-  cost = seq_cost (seq, speed);
-  if (!cost)
-    cost = 1;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Conversion from %s to %s costs %d\n",
-	     GET_MODE_NAME (mode_to), GET_MODE_NAME (mode_from), cost);
-
-  costs[mode_to][mode_from][speed] = cost;
-  return cost;
-}
-
-/* Returns cost of negation in MODE.  */
-
-unsigned
-negate_reg_cost (enum machine_mode mode, bool speed)
-{
-  static unsigned costs[NUM_MACHINE_MODES][2];
-  rtx seq;
-  unsigned cost;
-
-  if (costs[mode][speed])
-    return costs[mode][speed];
-
-  start_sequence ();
-  force_operand (gen_rtx_fmt_e (NEG, mode,
-				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
-		 NULL_RTX);
-  seq = get_insns ();
-  end_sequence ();
-
-  cost = seq_cost (seq, speed);
-  if (!cost)
-    cost = 1;
-
-  costs[mode][speed] = cost;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Negation in %s costs %d\n",
-	     GET_MODE_NAME (mode), cost);
-  return cost;
-}
-
-/* Entry in a hashtable of already known costs for multiplication.  */
-struct mbc_entry
-{
-  HOST_WIDE_INT cst;		/* The constant to multiply by.  */
-  enum machine_mode mode;	/* In mode.  */
-  unsigned cost;		/* The cost.  */
-};
-
-/* Counts hash value for the ENTRY.  */
-
-static hashval_t
-mbc_entry_hash (const void *entry)
-{
-  const struct mbc_entry *e = (const struct mbc_entry *) entry;
-
-  return 57 * (hashval_t) e->mode + (hashval_t) (e->cst % 877);
-}
-
-/* Compares the hash table entries ENTRY1 and ENTRY2.  */
-
-static int
-mbc_entry_eq (const void *entry1, const void *entry2)
-{
-  const struct mbc_entry *e1 = (const struct mbc_entry *) entry1;
-  const struct mbc_entry *e2 = (const struct mbc_entry *) entry2;
-
-  return (e1->mode == e2->mode
-	  && e1->cst == e2->cst);
-}
-
-/* Returns cost of multiplication by constant CST in MODE.  */
-
-unsigned
-multiply_by_const_cost (HOST_WIDE_INT cst, enum machine_mode mode, bool speed)
-{
-  struct mbc_entry **cached, act;
-  rtx seq;
-  unsigned cost;
-
-  gcc_assert (cost_tables_exist);
-
-  act.mode = mode;
-  act.cst = cst;
-  cached = (struct mbc_entry **)
-    htab_find_slot (mult_costs[speed], &act, INSERT);
-    
-  if (*cached)
-    return (*cached)->cost;
-
-  *cached = XNEW (struct mbc_entry);
-  (*cached)->mode = mode;
-  (*cached)->cst = cst;
-
-  start_sequence ();
-  expand_mult (mode, gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
-	       gen_int_mode (cst, mode), NULL_RTX, 0);
-  seq = get_insns ();
-  end_sequence ();
-
-  cost = seq_cost (seq, speed);
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Multiplication by %d in %s costs %d\n",
-	     (int) cst, GET_MODE_NAME (mode), cost);
-
-  (*cached)->cost = cost;
-
-  return cost;
-}
-
 /* Returns true if multiplying by RATIO is allowed in an address.  Test the
    validity for a memory reference accessing memory of mode MODE in
    address space AS.  */
@@ -3582,7 +3308,7 @@  get_address_cost (bool symbol_present, bool var_pr
 	 If VAR_PRESENT is true, try whether the mode with
 	 SYMBOL_PRESENT = false is cheaper even with cost of addition, and
 	 if this is the case, use it.  */
-      add_c = add_regs_cost (address_mode, speed);
+      add_c = add_cost[speed][address_mode];
       for (i = 0; i < 8; i++)
 	{
 	  var_p = i & 1;
@@ -3663,10 +3389,10 @@  get_address_cost (bool symbol_present, bool var_pr
 	     && multiplier_allowed_in_address_p (ratio, mem_mode, as));
 
   if (ratio != 1 && !ratio_p)
-    cost += multiply_by_const_cost (ratio, address_mode, speed);
+    cost += mult_by_coeff_cost (ratio, address_mode, speed);
 
   if (s_offset && !offset_p && !symbol_present)
-    cost += add_regs_cost (address_mode, speed);
+    cost += add_cost[speed][address_mode];
 
   if (may_autoinc)
     *may_autoinc = autoinc;
@@ -3833,7 +3559,7 @@  force_expr_to_var_cost (tree expr, bool speed)
     case PLUS_EXPR:
     case MINUS_EXPR:
     case NEGATE_EXPR:
-      cost = new_cost (add_regs_cost (mode, speed), 0);
+      cost = new_cost (add_cost[speed][mode], 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
         {
           tree mult = NULL_TREE;
@@ -3853,11 +3579,11 @@  force_expr_to_var_cost (tree expr, bool speed)
 
     case MULT_EXPR:
       if (cst_and_fits_in_hwi (op0))
-	cost = new_cost (multiply_by_const_cost (int_cst_value (op0),
-						 mode, speed), 0);
+	cost = new_cost (mult_by_coeff_cost (int_cst_value (op0),
+					     mode, speed), 0);
       else if (cst_and_fits_in_hwi (op1))
-	cost = new_cost (multiply_by_const_cost (int_cst_value (op1),
-						 mode, speed), 0);
+	cost = new_cost (mult_by_coeff_cost (int_cst_value (op1),
+					     mode, speed), 0);
       else
 	return new_cost (target_spill_cost [speed], 0);
       break;
@@ -4023,7 +3749,7 @@  difference_cost (struct ivopts_data *data,
   if (integer_zerop (e1))
     {
       comp_cost cost = force_var_cost (data, e2, depends_on);
-      cost.cost += multiply_by_const_cost (-1, mode, data->speed);
+      cost.cost += mult_by_coeff_cost (-1, mode, data->speed);
       return cost;
     }
 
@@ -4334,7 +4060,7 @@  get_computation_cost_at (struct ivopts_data *data,
 					 &symbol_present, &var_present,
 					 &offset, depends_on));
       cost.cost /= avg_loop_niter (data->current_loop);
-      cost.cost += add_regs_cost (TYPE_MODE (ctype), data->speed);
+      cost.cost += add_cost[data->speed][TYPE_MODE (ctype)];
     }
 
   if (inv_expr_id)
@@ -4367,7 +4093,7 @@  get_computation_cost_at (struct ivopts_data *data,
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
-	cost.cost += multiply_by_const_cost (ratio, TYPE_MODE (ctype), speed);
+	cost.cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
       return cost;
     }
 
@@ -4375,18 +4101,18 @@  get_computation_cost_at (struct ivopts_data *data,
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += adjust_setup_cost (data,
-				    add_regs_cost (TYPE_MODE (ctype), speed));
+				    add_cost[speed][TYPE_MODE (ctype)]);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
   if (offset)
     cost.complexity++;
 
-  cost.cost += add_regs_cost (TYPE_MODE (ctype), speed);
+  cost.cost += add_cost[speed][TYPE_MODE (ctype)];
 
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
-    cost.cost += multiply_by_const_cost (aratio, TYPE_MODE (ctype), speed);
+    cost.cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
   return cost;
 
 fallback:
@@ -5232,7 +4958,7 @@  determine_iv_cost (struct ivopts_data *data, struc
      or a const set.  */
   if (cost_base.cost == 0)
     cost_base.cost = COSTS_N_INSNS (1);
-  cost_step = add_regs_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
+  cost_step = add_cost[data->speed][TYPE_MODE (TREE_TYPE (base))];
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
 
@@ -6804,8 +6530,6 @@  tree_ssa_iv_optimize_finalize (struct ivopts_data
   VEC_free (iv_use_p, heap, data->iv_uses);
   VEC_free (iv_cand_p, heap, data->iv_candidates);
   htab_delete (data->inv_expr_tab);
-
-  finalize_costs ();
 }
 
 /* Returns true if the loop body BODY includes any function calls.  */
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 189845)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -42,6 +42,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "ggc.h"
 #include "target.h"
+#include "expmed.h"
 
 /* TODO -- handling of symbols (according to Richard Hendersons
    comments, http://gcc.gnu.org/ml/gcc-patches/2005-04/msg00949.html):
@@ -554,7 +555,7 @@  most_expensive_mult_to_index (tree type, struct me
 	  || !multiplier_allowed_in_address_p (coef, TYPE_MODE (type), as))
 	continue;
 
-      acost = multiply_by_const_cost (coef, address_mode, speed);
+      acost = mult_by_coeff_cost (coef, address_mode, speed);
 
       if (acost > best_mult_cost)
 	{
Index: gcc/gimple-ssa-strength-reduction.c
===================================================================
--- gcc/gimple-ssa-strength-reduction.c	(revision 189845)
+++ gcc/gimple-ssa-strength-reduction.c	(working copy)
@@ -54,6 +54,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-flow.h"
 #include "domwalk.h"
 #include "pointer-set.h"
+#include "expmed.h"
 
 /* Information about a strength reduction candidate.  Each statement
    in the candidate table represents an expression of one of the
@@ -340,29 +341,22 @@  stmt_cost (gimple gs, bool speed)
       rhs2 = gimple_assign_rhs2 (gs);
 
       if (host_integerp (rhs2, 0))
-	return multiply_by_const_cost (TREE_INT_CST_LOW (rhs2), lhs_mode,
-				       speed);
+	return mult_by_coeff_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
 
       gcc_assert (TREE_CODE (rhs1) != INTEGER_CST);
-      return multiply_regs_cost (TYPE_MODE (TREE_TYPE (lhs)), speed);
+      return mul_cost[speed][lhs_mode];
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
     case MINUS_EXPR:
       rhs2 = gimple_assign_rhs2 (gs);
+      return add_cost[speed][lhs_mode];
 
-      if (host_integerp (rhs2, 0))
-	return add_const_cost (TYPE_MODE (TREE_TYPE (rhs1)), speed);
-
-      gcc_assert (TREE_CODE (rhs1) != INTEGER_CST);
-      return add_regs_cost (lhs_mode, speed);
-
     case NEGATE_EXPR:
-      return negate_reg_cost (lhs_mode, speed);
+      return neg_cost[speed][lhs_mode];
 
     case NOP_EXPR:
-      return extend_or_trunc_reg_cost (TREE_TYPE (lhs), TREE_TYPE (rhs1),
-				       speed);
+      return convert_cost (lhs_mode, TYPE_MODE (TREE_TYPE (rhs1)), speed);
 
     /* Note that we don't assign costs to copies that in most cases
        will go away.  */
@@ -1460,9 +1454,6 @@  execute_strength_reduction (void)
      back edges, and this gives us dominator information as well.  */
   loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
 
-  /* Initialize costs tables in IVOPTS.  */
-  initialize_costs ();
-
   /* Set up callbacks for the generic dominator tree walker.  */
   walk_data.dom_direction = CDI_DOMINATORS;
   walk_data.initialize_block_local_data = NULL;
@@ -1493,7 +1484,6 @@  execute_strength_reduction (void)
   pointer_map_destroy (stmt_cand_map);
   VEC_free (slsr_cand_t, heap, cand_vec);
   obstack_free (&cand_obstack, NULL);
-  finalize_costs ();
 
   return 0;
 }
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	(revision 189845)
+++ gcc/expmed.c	(working copy)
@@ -112,6 +112,7 @@  struct init_expmed_rtl
   struct rtx_def shift_add;	rtunion shift_add_fld1;
   struct rtx_def shift_sub0;	rtunion shift_sub0_fld1;
   struct rtx_def shift_sub1;	rtunion shift_sub1_fld1;
+  struct rtx_def convert;
 
   rtx pow2[MAX_BITS_PER_WORD];
   rtx cint[MAX_BITS_PER_WORD];
@@ -122,6 +123,7 @@  init_expmed_one_mode (struct init_expmed_rtl *all,
 		      enum machine_mode mode, int speed)
 {
   int m, n, mode_bitsize;
+  enum machine_mode mode_from;
 
   mode_bitsize = GET_MODE_UNIT_BITSIZE (mode);
 
@@ -139,6 +141,7 @@  init_expmed_one_mode (struct init_expmed_rtl *all,
   PUT_MODE (&all->shift_add, mode);
   PUT_MODE (&all->shift_sub0, mode);
   PUT_MODE (&all->shift_sub1, mode);
+  PUT_MODE (&all->convert, mode);
 
   add_cost[speed][mode] = set_src_cost (&all->plus, speed);
   neg_cost[speed][mode] = set_src_cost (&all->neg, speed);
@@ -183,6 +186,30 @@  init_expmed_one_mode (struct init_expmed_rtl *all,
 	  mul_highpart_cost[speed][mode]
 	    = set_src_cost (&all->wide_trunc, speed);
 	}
+
+      for (mode_from = GET_CLASS_NARROWEST_MODE (MODE_INT);
+	   mode_from != VOIDmode;
+	   mode_from = GET_MODE_WIDER_MODE (mode_from))
+	if (mode != mode_from)
+	  {
+	    unsigned short size_to = GET_MODE_SIZE (mode);
+	    unsigned short size_from = GET_MODE_SIZE (mode_from);
+	    if (size_to < size_from)
+	      {
+		PUT_CODE (&all->convert, TRUNCATE);
+		PUT_MODE (&all->reg, mode_from);
+		set_convert_cost (mode, mode_from, speed,
+				  set_src_cost (&all->convert, speed));
+	      }
+	    else if (size_from < size_to)
+	      {
+		/* Assume cost of zero-extend and sign-extend is the same.  */
+		PUT_CODE (&all->convert, ZERO_EXTEND);
+		PUT_MODE (&all->reg, mode_from);
+		set_convert_cost (mode, mode_from, speed,
+				  set_src_cost (&all->convert, speed));
+	      }
+	  }
     }
 }
 
@@ -262,6 +289,9 @@  init_expmed (void)
   XEXP (&all.shift_sub1, 0) = &all.reg;
   XEXP (&all.shift_sub1, 1) = &all.shift_mult;
 
+  PUT_CODE (&all.convert, TRUNCATE);
+  XEXP (&all.convert, 0) = &all.reg;
+
   for (speed = 0; speed < 2; speed++)
     {
       crtl->maybe_hot_insn_p = speed;
@@ -3262,6 +3292,24 @@  expand_mult (enum machine_mode mode, rtx op0, rtx
   return op0;
 }
 
+/* Return a cost estimate for multiplying a register by the given
+   COEFFicient in the given MODE and SPEED.  */
+
+int
+mult_by_coeff_cost (HOST_WIDE_INT coeff, enum machine_mode mode, bool speed)
+{
+  int max_cost;
+  struct algorithm algorithm;
+  enum mult_variant variant;
+
+  rtx fake_reg = gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1);
+  max_cost = set_src_cost (gen_rtx_MULT (mode, fake_reg, fake_reg), speed);
+  if (choose_mult_variant (mode, coeff, &algorithm, &variant, max_cost))
+    return algorithm.cost.cost;
+  else
+    return max_cost;
+}
+
 /* Perform a widening multiplication and return an rtx for the result.
    MODE is mode of value; OP0 and OP1 are what to multiply (rtx's);
    TARGET is a suggestion for where to store the result (an rtx).
Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	(revision 189845)
+++ gcc/expmed.h	(working copy)
@@ -124,6 +124,8 @@  struct alg_hash_entry {
 #define NUM_ALG_HASH_ENTRIES 307
 #endif
 
+#define NUM_MODE_INT (MAX_MODE_INT - MIN_MODE_INT + 1)
+
 /* Target-dependent globals.  */
 struct target_expmed {
   /* Each entry of ALG_HASH caches alg_code for some integer.  This is
@@ -155,6 +157,11 @@  struct target_expmed {
   int x_udiv_cost[2][NUM_MACHINE_MODES];
   int x_mul_widen_cost[2][NUM_MACHINE_MODES];
   int x_mul_highpart_cost[2][NUM_MACHINE_MODES];
+
+  /* Conversion costs are only defined between two scalar integer modes
+     of different sizes.  The first machine mode is the destination mode,
+     and the second is the source mode.  */
+  int x_convert_cost[2][NUM_MODE_INT][NUM_MODE_INT];
 };
 
 extern struct target_expmed default_target_expmed;
@@ -197,4 +204,43 @@  extern struct target_expmed *this_target_expmed;
 #define mul_highpart_cost \
   (this_target_expmed->x_mul_highpart_cost)
 
+/* Set the COST for converting from FROM_MODE to TO_MODE when optimizing
+   for SPEED.  */
+
+static inline void
+set_convert_cost (enum machine_mode to_mode, enum machine_mode from_mode,
+		  bool speed, int cost)
+{
+  int to_idx, from_idx;
+
+  gcc_assert (to_mode >= MIN_MODE_INT
+	      && to_mode <= MAX_MODE_INT
+	      && from_mode >= MIN_MODE_INT
+	      && from_mode <= MAX_MODE_INT);
+
+  to_idx = to_mode - MIN_MODE_INT;
+  from_idx = from_mode - MIN_MODE_INT;
+  this_target_expmed->x_convert_cost[speed][to_idx][from_idx] = cost;
+}
+
+/* Return the cost for converting from FROM_MODE to TO_MODE when optimizing
+   for SPEED.  */
+
+static inline int
+convert_cost (enum machine_mode to_mode, enum machine_mode from_mode,
+	      bool speed)
+{
+  int to_idx, from_idx;
+
+  gcc_assert (to_mode >= MIN_MODE_INT
+	      && to_mode <= MAX_MODE_INT
+	      && from_mode >= MIN_MODE_INT
+	      && from_mode <= MAX_MODE_INT);
+
+  to_idx = to_mode - MIN_MODE_INT;
+  from_idx = from_mode - MIN_MODE_INT;
+  return this_target_expmed->x_convert_cost[speed][to_idx][from_idx];
+}
+
+extern int mult_by_coeff_cost (HOST_WIDE_INT, enum machine_mode, bool);
 #endif
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 189845)
+++ gcc/tree-flow.h	(working copy)
@@ -806,14 +806,6 @@  bool expr_invariant_in_loop_p (struct loop *, tree
 bool stmt_invariant_in_loop_p (struct loop *, gimple);
 bool multiplier_allowed_in_address_p (HOST_WIDE_INT, enum machine_mode,
 				      addr_space_t);
-void initialize_costs (void);
-void finalize_costs (void);
-unsigned multiply_by_const_cost (HOST_WIDE_INT, enum machine_mode, bool);
-unsigned add_regs_cost (enum machine_mode, bool);
-unsigned multiply_regs_cost (enum machine_mode, bool);
-unsigned add_const_cost (enum machine_mode, bool);
-unsigned extend_or_trunc_reg_cost (tree, tree, bool);
-unsigned negate_reg_cost (enum machine_mode, bool);
 bool may_be_nonaddressable_p (tree expr);
 
 /* In tree-ssa-threadupdate.c.  */