Patchwork Strength reduction preliminaries

login
register
mail settings
Submitter William J. Schmidt
Date June 21, 2012, 4:01 p.m.
Message ID <1340294463.2813.46.camel@gnopaine>
Download mbox | patch
Permalink /patch/166350/
State New
Headers show

Comments

William J. Schmidt - June 21, 2012, 4:01 p.m.
As promised, this breaks out the changes to the IVOPTS cost model and
the added function in double-int.c.  Please let me know if you would
rather see me attempt to consolidate the IVOPTS logic into expmed.c per
Richard H's suggestion.

I ran into a glitch with multiply_by_const_cost.  The original code
declared a static htab_t in the function and allocated it on demand.
When I tried adding a second one in the same manner, I ran into a
locking problem in the memory management library code during a call to
delete_htab.  The original implementation seemed a bit dicey to me
anyway, so I changed this to explicitly allocate and deallocate the hash
tables on (entry to/exit from) IVOPTS.

This reduces the scope of the hash table from a compilation unit to each
individual function.  If it's preferred to maintain compilation unit
scope, then the initialization/finalization of the htabs can be pushed
out to do_compile.  But I doubt it's worth that.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
regressions.  Ok for trunk?

Thanks,
Bill


2012-06-21  Bill Schmidt  <wschmidt@linux.ibm.com>

	* double-int.c (double_int_multiple_of): New function.
	* double-int.h (double_int_multiple_of): New decl.
	* tree-ssa-loop-ivopts.c (add_cost, zero_cost): Remove undefs.
	(mbc_entry_hash): New forward decl.
	(mbc_entry_eq): Likewise.
	(zero_cost): Change to no_cost.
	(mult_costs): New static var.
	(tree_ssa_iv_optimize_init): Initialize mult_costs.
	(add_cost): Change to add_regs_cost; distinguish costs by speed.
	(multiply_regs_cost): New function.
	(add_const_cost): Likewise.
	(extend_or_trunc_reg_cost): Likewise.
	(negate_reg_cost): Likewise.
	(multiply_by_cost): Change to multiply_by_const_cost; distinguish
	costs by speed.
	(get_address_cost): Change add_cost to add_regs_cost; change
	multiply_by_cost to multiply_by_const_cost.
	(force_expr_to_var_cost): Change zero_cost to no_cost; change
	add_cost to add_regs_cost; change multiply_by_cost to
	multiply_by_const_cost.
	(split_cost): Change zero_cost to no_cost.
	(ptr_difference_cost): Likewise.
	(difference_cost): Change zero_cost to no_cost; change multiply_by_cost
	to multiply_by_const_cost.
	(get_computation_cost_at): Change add_cost to add_regs_cost; change
	multiply_by_cost to multiply_by_const_cost.
	(determine_use_iv_cost_generic): Change zero_cost to no_cost.
	(determine_iv_cost): Change add_cost to add_regs_cost.
	(iv_ca_new): Change zero_cost to no_cost.
	(tree_ssa_iv_optimize_finalize): Release storage for mult_costs.
	* tree-ssa-address.c (most_expensive_mult_to_index): Change
	multiply_by_cost to multiply_by_const_cost.
	* tree-flow.h (multiply_by_cost): Change to multiply_by_const_cost.
	(add_regs_cost): New decl.
	(multiply_regs_cost): Likewise.
	(add_const_cost): Likewise.
	(extend_or_trunc_reg_cost): Likewise.
	(negate_reg_cost): Likewise.
Richard Guenther - June 22, 2012, 8:44 a.m.
On Thu, 21 Jun 2012, William J. Schmidt wrote:

> As promised, this breaks out the changes to the IVOPTS cost model and
> the added function in double-int.c.  Please let me know if you would
> rather see me attempt to consolidate the IVOPTS logic into expmed.c per
> Richard H's suggestion.

If we start to use it from multiple places that definitely makes sense,
but you can move the stuff as a followup.

> I ran into a glitch with multiply_by_const_cost.  The original code
> declared a static htab_t in the function and allocated it on demand.
> When I tried adding a second one in the same manner, I ran into a
> locking problem in the memory management library code during a call to
> delete_htab.  The original implementation seemed a bit dicey to me
> anyway, so I changed this to explicitly allocate and deallocate the hash
> tables on (entry to/exit from) IVOPTS.

Huh.  That's weird and should not happen.  Still it makes sense to
move this to a per-function cache given that its size is basically
unbound.

Can you introduce a initialize_costs () / finalize_costs () function
pair that allocates / frees the tables and sets a global flag that
you can then assert in the functions using those tables?

> +  if (speed)
> +    speed = 1;

I suppose this is because bool is not bool when building with a
C compiler?  It really looks weird and if such is necessary I'd
prefer something like

> +add_regs_cost (enum machine_mode mode, bool speed)
>  {
> +  static unsigned costs[NUM_MACHINE_MODES][2];
>    rtx seq;
>    unsigned cost;
     unsigned sidx = speed ? 0 : 1;
>  
> +  if (costs[mode][sidx])
> +    return costs[mode][sidx];
> +

instead.

Otherwise the patch is ok.

Thanks,
Richard.

> This reduces the scope of the hash table from a compilation unit to each
> individual function.  If it's preferred to maintain compilation unit
> scope, then the initialization/finalization of the htabs can be pushed
> out to do_compile.  But I doubt it's worth that.
> 
> Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> regressions.  Ok for trunk?
> 
> Thanks,
> Bill
> 
> 
> 2012-06-21  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> 	* double-int.c (double_int_multiple_of): New function.
> 	* double-int.h (double_int_multiple_of): New decl.
> 	* tree-ssa-loop-ivopts.c (add_cost, zero_cost): Remove undefs.
> 	(mbc_entry_hash): New forward decl.
> 	(mbc_entry_eq): Likewise.
> 	(zero_cost): Change to no_cost.
> 	(mult_costs): New static var.
> 	(tree_ssa_iv_optimize_init): Initialize mult_costs.
> 	(add_cost): Change to add_regs_cost; distinguish costs by speed.
> 	(multiply_regs_cost): New function.
> 	(add_const_cost): Likewise.
> 	(extend_or_trunc_reg_cost): Likewise.
> 	(negate_reg_cost): Likewise.
> 	(multiply_by_cost): Change to multiply_by_const_cost; distinguish
> 	costs by speed.
> 	(get_address_cost): Change add_cost to add_regs_cost; change
> 	multiply_by_cost to multiply_by_const_cost.
> 	(force_expr_to_var_cost): Change zero_cost to no_cost; change
> 	add_cost to add_regs_cost; change multiply_by_cost to
> 	multiply_by_const_cost.
> 	(split_cost): Change zero_cost to no_cost.
> 	(ptr_difference_cost): Likewise.
> 	(difference_cost): Change zero_cost to no_cost; change multiply_by_cost
> 	to multiply_by_const_cost.
> 	(get_computation_cost_at): Change add_cost to add_regs_cost; change
> 	multiply_by_cost to multiply_by_const_cost.
> 	(determine_use_iv_cost_generic): Change zero_cost to no_cost.
> 	(determine_iv_cost): Change add_cost to add_regs_cost.
> 	(iv_ca_new): Change zero_cost to no_cost.
> 	(tree_ssa_iv_optimize_finalize): Release storage for mult_costs.
> 	* tree-ssa-address.c (most_expensive_mult_to_index): Change
> 	multiply_by_cost to multiply_by_const_cost.
> 	* tree-flow.h (multiply_by_cost): Change to multiply_by_const_cost.
> 	(add_regs_cost): New decl.
> 	(multiply_regs_cost): Likewise.
> 	(add_const_cost): Likewise.
> 	(extend_or_trunc_reg_cost): Likewise.
> 	(negate_reg_cost): Likewise.
> 
> 
> Index: gcc/double-int.c
> ===================================================================
> --- gcc/double-int.c	(revision 188839)
> +++ gcc/double-int.c	(working copy)
> @@ -865,6 +865,26 @@ double_int_umod (double_int a, double_int b, unsig
>    return double_int_mod (a, b, true, code);
>  }
>  
> +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
> +   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
> +   unchanged.  */
> +
> +bool
> +double_int_multiple_of (double_int product, double_int factor,
> +			bool unsigned_p, double_int *multiple)
> +{
> +  double_int remainder;
> +  double_int quotient = double_int_divmod (product, factor, unsigned_p,
> +					   TRUNC_DIV_EXPR, &remainder);
> +  if (double_int_zero_p (remainder))
> +    {
> +      *multiple = quotient;
> +      return true;
> +    }
> +
> +  return false;
> +}
> +
>  /* Set BITPOS bit in A.  */
>  double_int
>  double_int_setbit (double_int a, unsigned bitpos)
> Index: gcc/double-int.h
> ===================================================================
> --- gcc/double-int.h	(revision 188839)
> +++ gcc/double-int.h	(working copy)
> @@ -150,6 +150,8 @@ double_int double_int_divmod (double_int, double_i
>  double_int double_int_sdivmod (double_int, double_int, unsigned, double_int *);
>  double_int double_int_udivmod (double_int, double_int, unsigned, double_int *);
>  
> +bool double_int_multiple_of (double_int, double_int, bool, double_int *);
> +
>  double_int double_int_setbit (double_int, unsigned);
>  int double_int_ctz (double_int);
>  
> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c	(revision 188839)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -89,13 +89,11 @@ along with GCC; see the file COPYING3.  If not see
>  #include "target.h"
>  #include "tree-inline.h"
>  #include "tree-ssa-propagate.h"
> -
> -/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
> - */
>  #include "expmed.h"
> -#undef add_cost
> -#undef zero_cost
>  
> +static hashval_t mbc_entry_hash (const void *);
> +static int mbc_entry_eq (const void*, const void *);
> +
>  /* FIXME: Expressions are expanded to RTL in this pass to determine the
>     cost of different addressing modes.  This should be moved to a TBD
>     interface between the GIMPLE and RTL worlds.  */
> @@ -162,7 +160,7 @@ typedef struct
>  			   complex expressions and addressing modes).  */
>  } comp_cost;
>  
> -static const comp_cost zero_cost = {0, 0};
> +static const comp_cost no_cost = {0, 0};
>  static const comp_cost infinite_cost = {INFTY, INFTY};
>  
>  /* The candidate - cost pair.  */
> @@ -386,6 +384,9 @@ struct iv_ca_delta
>  
>  static VEC(tree,heap) *decl_rtl_to_reset;
>  
> +/* Cached costs for multiplies by constants.  */
> +static htab_t mult_costs[2];
> +
>  static comp_cost force_expr_to_var_cost (tree, bool);
>  
>  /* Number of uses recorded in DATA.  */
> @@ -869,6 +870,9 @@ tree_ssa_iv_optimize_init (struct ivopts_data *dat
>                                      htab_inv_expr_eq, free);
>    data->inv_expr_id = 0;
>    decl_rtl_to_reset = VEC_alloc (tree, heap, 20);
> +
> +  mult_costs[0] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
> +  mult_costs[1] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
>  }
>  
>  /* Returns a memory object to that EXPR points.  In case we are able to
> @@ -3057,16 +3061,19 @@ adjust_setup_cost (struct ivopts_data *data, unsig
>  
>  /* Returns cost of addition in MODE.  */
>  
> -static unsigned
> -add_cost (enum machine_mode mode, bool speed)
> +unsigned
> +add_regs_cost (enum machine_mode mode, bool speed)
>  {
> -  static unsigned costs[NUM_MACHINE_MODES];
> +  static unsigned costs[NUM_MACHINE_MODES][2];
>    rtx seq;
>    unsigned cost;
>  
> -  if (costs[mode])
> -    return costs[mode];
> +  if (speed)
> +    speed = 1;
>  
> +  if (costs[mode][speed])
> +    return costs[mode][speed];
> +
>    start_sequence ();
>    force_operand (gen_rtx_fmt_ee (PLUS, mode,
>  				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
> @@ -3079,7 +3086,7 @@ adjust_setup_cost (struct ivopts_data *data, unsig
>    if (!cost)
>      cost = 1;
>  
> -  costs[mode] = cost;
> +  costs[mode][speed] = cost;
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      fprintf (dump_file, "Addition in %s costs %d\n",
> @@ -3087,6 +3094,160 @@ adjust_setup_cost (struct ivopts_data *data, unsig
>    return cost;
>  }
>  
> +/* Returns cost of multiplication in MODE.  */
> +
> +unsigned
> +multiply_regs_cost (enum machine_mode mode, bool speed)
> +{
> +  static unsigned costs[NUM_MACHINE_MODES][2];
> +  rtx seq;
> +  unsigned cost;
> +
> +  if (speed)
> +    speed = 1;
> +
> +  if (costs[mode][speed])
> +    return costs[mode][speed];
> +
> +  start_sequence ();
> +  force_operand (gen_rtx_fmt_ee (MULT, mode,
> +				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
> +				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
> +		 NULL_RTX);
> +  seq = get_insns ();
> +  end_sequence ();
> +
> +  cost = seq_cost (seq, speed);
> +  if (!cost)
> +    cost = 1;
> +
> +  costs[mode][speed] = cost;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Multiplication in %s costs %d\n",
> +	     GET_MODE_NAME (mode), cost);
> +  return cost;
> +}
> +
> +/* Returns cost of addition with a constant in MODE.  */
> +
> +unsigned
> +add_const_cost (enum machine_mode mode, bool speed)
> +{
> +  static unsigned costs[NUM_MACHINE_MODES][2];
> +  rtx seq;
> +  unsigned cost;
> +
> +  if (speed)
> +    speed = 1;
> +
> +  if (costs[mode][speed])
> +    return costs[mode][speed];
> +
> +  /* Arbitrarily generate insns for x + 2, as the exact constant
> +     shouldn't matter.  */
> +  start_sequence ();
> +  force_operand (gen_rtx_fmt_ee (PLUS, mode,
> +				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
> +				 gen_int_mode (2, mode)),
> +		 NULL_RTX);
> +  seq = get_insns ();
> +  end_sequence ();
> +
> +  cost = seq_cost (seq, speed);
> +  if (!cost)
> +    cost = 1;
> +
> +  costs[mode][speed] = cost;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Addition to constant in %s costs %d\n",
> +	     GET_MODE_NAME (mode), cost);
> +  return cost;
> +}
> +
> +/* Returns cost of extend or truncate in MODE.  */
> +
> +unsigned
> +extend_or_trunc_reg_cost (tree type_to, tree type_from, bool speed)
> +{
> +  static unsigned costs[NUM_MACHINE_MODES][NUM_MACHINE_MODES][2];
> +  rtx seq;
> +  unsigned cost;
> +  enum machine_mode mode_to = TYPE_MODE (type_to);
> +  enum machine_mode mode_from = TYPE_MODE (type_from);
> +  tree size_to = TYPE_SIZE (type_to);
> +  tree size_from = TYPE_SIZE (type_from);
> +  enum rtx_code code;
> +
> +  gcc_assert (TREE_CODE (size_to) == INTEGER_CST
> +	      && TREE_CODE (size_from) == INTEGER_CST);
> +
> +  if (speed)
> +    speed = 1;
> +
> +  if (costs[mode_to][mode_from][speed])
> +    return costs[mode_to][mode_from][speed];
> +
> +  if (tree_int_cst_lt (size_to, size_from))
> +    code = TRUNCATE;
> +  else if (TYPE_UNSIGNED (type_to))
> +    code = ZERO_EXTEND;
> +  else
> +    code = SIGN_EXTEND;
> +
> +  start_sequence ();
> +  gen_rtx_fmt_e (code, mode_to,
> +		 gen_raw_REG (mode_from, LAST_VIRTUAL_REGISTER + 1));
> +  seq = get_insns ();
> +  end_sequence ();
> +
> +  cost = seq_cost (seq, speed);
> +  if (!cost)
> +    cost = 1;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Conversion from %s to %s costs %d\n",
> +	     GET_MODE_NAME (mode_to), GET_MODE_NAME (mode_from), cost);
> +
> +  costs[mode_to][mode_from][speed] = cost;
> +  return cost;
> +}
> +
> +/* Returns cost of negation in MODE.  */
> +
> +unsigned
> +negate_reg_cost (enum machine_mode mode, bool speed)
> +{
> +  static unsigned costs[NUM_MACHINE_MODES][2];
> +  rtx seq;
> +  unsigned cost;
> +
> +  if (speed)
> +    speed = 1;
> +
> +  if (costs[mode][speed])
> +    return costs[mode][speed];
> +
> +  start_sequence ();
> +  force_operand (gen_rtx_fmt_e (NEG, mode,
> +				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
> +		 NULL_RTX);
> +  seq = get_insns ();
> +  end_sequence ();
> +
> +  cost = seq_cost (seq, speed);
> +  if (!cost)
> +    cost = 1;
> +
> +  costs[mode][speed] = cost;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Negation in %s costs %d\n",
> +	     GET_MODE_NAME (mode), cost);
> +  return cost;
> +}
> +
>  /* Entry in a hashtable of already known costs for multiplication.  */
>  struct mbc_entry
>  {
> @@ -3120,19 +3281,20 @@ mbc_entry_eq (const void *entry1, const void *entr
>  /* Returns cost of multiplication by constant CST in MODE.  */
>  
>  unsigned
> -multiply_by_cost (HOST_WIDE_INT cst, enum machine_mode mode, bool speed)
> +multiply_by_const_cost (HOST_WIDE_INT cst, enum machine_mode mode, bool speed)
>  {
> -  static htab_t costs;
>    struct mbc_entry **cached, act;
>    rtx seq;
>    unsigned cost;
>  
> -  if (!costs)
> -    costs = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
> +  if (speed)
> +    speed = 1;
>  
>    act.mode = mode;
>    act.cst = cst;
> -  cached = (struct mbc_entry **) htab_find_slot (costs, &act, INSERT);
> +  cached = (struct mbc_entry **)
> +    htab_find_slot (mult_costs[speed], &act, INSERT);
> +    
>    if (*cached)
>      return (*cached)->cost;
>  
> @@ -3418,7 +3580,7 @@ get_address_cost (bool symbol_present, bool var_pr
>  	 If VAR_PRESENT is true, try whether the mode with
>  	 SYMBOL_PRESENT = false is cheaper even with cost of addition, and
>  	 if this is the case, use it.  */
> -      add_c = add_cost (address_mode, speed);
> +      add_c = add_regs_cost (address_mode, speed);
>        for (i = 0; i < 8; i++)
>  	{
>  	  var_p = i & 1;
> @@ -3499,10 +3661,10 @@ get_address_cost (bool symbol_present, bool var_pr
>  	     && multiplier_allowed_in_address_p (ratio, mem_mode, as));
>  
>    if (ratio != 1 && !ratio_p)
> -    cost += multiply_by_cost (ratio, address_mode, speed);
> +    cost += multiply_by_const_cost (ratio, address_mode, speed);
>  
>    if (s_offset && !offset_p && !symbol_present)
> -    cost += add_cost (address_mode, speed);
> +    cost += add_regs_cost (address_mode, speed);
>  
>    if (may_autoinc)
>      *may_autoinc = autoinc;
> @@ -3601,7 +3763,7 @@ force_expr_to_var_cost (tree expr, bool speed)
>    STRIP_NOPS (expr);
>  
>    if (SSA_VAR_P (expr))
> -    return zero_cost;
> +    return no_cost;
>  
>    if (is_gimple_min_invariant (expr))
>      {
> @@ -3633,12 +3795,12 @@ force_expr_to_var_cost (tree expr, bool speed)
>        STRIP_NOPS (op1);
>  
>        if (is_gimple_val (op0))
> -	cost0 = zero_cost;
> +	cost0 = no_cost;
>        else
>  	cost0 = force_expr_to_var_cost (op0, speed);
>  
>        if (is_gimple_val (op1))
> -	cost1 = zero_cost;
> +	cost1 = no_cost;
>        else
>  	cost1 = force_expr_to_var_cost (op1, speed);
>  
> @@ -3650,11 +3812,11 @@ force_expr_to_var_cost (tree expr, bool speed)
>        op1 = NULL_TREE;
>  
>        if (is_gimple_val (op0))
> -	cost0 = zero_cost;
> +	cost0 = no_cost;
>        else
>  	cost0 = force_expr_to_var_cost (op0, speed);
>  
> -      cost1 = zero_cost;
> +      cost1 = no_cost;
>        break;
>  
>      default:
> @@ -3669,7 +3831,7 @@ force_expr_to_var_cost (tree expr, bool speed)
>      case PLUS_EXPR:
>      case MINUS_EXPR:
>      case NEGATE_EXPR:
> -      cost = new_cost (add_cost (mode, speed), 0);
> +      cost = new_cost (add_regs_cost (mode, speed), 0);
>        if (TREE_CODE (expr) != NEGATE_EXPR)
>          {
>            tree mult = NULL_TREE;
> @@ -3689,9 +3851,11 @@ force_expr_to_var_cost (tree expr, bool speed)
>  
>      case MULT_EXPR:
>        if (cst_and_fits_in_hwi (op0))
> -	cost = new_cost (multiply_by_cost (int_cst_value (op0), mode, speed), 0);
> +	cost = new_cost (multiply_by_const_cost (int_cst_value (op0),
> +						 mode, speed), 0);
>        else if (cst_and_fits_in_hwi (op1))
> -	cost = new_cost (multiply_by_cost (int_cst_value (op1), mode, speed), 0);
> +	cost = new_cost (multiply_by_const_cost (int_cst_value (op1),
> +						 mode, speed), 0);
>        else
>  	return new_cost (target_spill_cost [speed], 0);
>        break;
> @@ -3766,12 +3930,12 @@ split_address_cost (struct ivopts_data *data,
>      {
>        *symbol_present = true;
>        *var_present = false;
> -      return zero_cost;
> +      return no_cost;
>      }
>  
>    *symbol_present = false;
>    *var_present = true;
> -  return zero_cost;
> +  return no_cost;
>  }
>  
>  /* Estimates cost of expressing difference of addresses E1 - E2 as
> @@ -3796,7 +3960,7 @@ ptr_difference_cost (struct ivopts_data *data,
>        *offset += diff;
>        *symbol_present = false;
>        *var_present = false;
> -      return zero_cost;
> +      return no_cost;
>      }
>  
>    if (integer_zerop (e2))
> @@ -3846,7 +4010,7 @@ difference_cost (struct ivopts_data *data,
>    if (operand_equal_p (e1, e2, 0))
>      {
>        *var_present = false;
> -      return zero_cost;
> +      return no_cost;
>      }
>  
>    *var_present = true;
> @@ -3857,7 +4021,7 @@ difference_cost (struct ivopts_data *data,
>    if (integer_zerop (e1))
>      {
>        comp_cost cost = force_var_cost (data, e2, depends_on);
> -      cost.cost += multiply_by_cost (-1, mode, data->speed);
> +      cost.cost += multiply_by_const_cost (-1, mode, data->speed);
>        return cost;
>      }
>  
> @@ -4168,7 +4332,7 @@ get_computation_cost_at (struct ivopts_data *data,
>  					 &symbol_present, &var_present,
>  					 &offset, depends_on));
>        cost.cost /= avg_loop_niter (data->current_loop);
> -      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
> +      cost.cost += add_regs_cost (TYPE_MODE (ctype), data->speed);
>      }
>  
>    if (inv_expr_id)
> @@ -4201,7 +4365,7 @@ get_computation_cost_at (struct ivopts_data *data,
>    if (!symbol_present && !var_present && !offset)
>      {
>        if (ratio != 1)
> -	cost.cost += multiply_by_cost (ratio, TYPE_MODE (ctype), speed);
> +	cost.cost += multiply_by_const_cost (ratio, TYPE_MODE (ctype), speed);
>        return cost;
>      }
>  
> @@ -4209,18 +4373,18 @@ get_computation_cost_at (struct ivopts_data *data,
>        are added once to the variable, if present.  */
>    if (var_present && (symbol_present || offset))
>      cost.cost += adjust_setup_cost (data,
> -				    add_cost (TYPE_MODE (ctype), speed));
> +				    add_regs_cost (TYPE_MODE (ctype), speed));
>  
>    /* Having offset does not affect runtime cost in case it is added to
>       symbol, but it increases complexity.  */
>    if (offset)
>      cost.complexity++;
>  
> -  cost.cost += add_cost (TYPE_MODE (ctype), speed);
> +  cost.cost += add_regs_cost (TYPE_MODE (ctype), speed);
>  
>    aratio = ratio > 0 ? ratio : -ratio;
>    if (aratio != 1)
> -    cost.cost += multiply_by_cost (aratio, TYPE_MODE (ctype), speed);
> +    cost.cost += multiply_by_const_cost (aratio, TYPE_MODE (ctype), speed);
>    return cost;
>  
>  fallback:
> @@ -4277,7 +4441,7 @@ determine_use_iv_cost_generic (struct ivopts_data
>    if (cand->pos == IP_ORIGINAL
>        && cand->incremented_at == use->stmt)
>      {
> -      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE,
> +      set_use_iv_cost (data, use, cand, no_cost, NULL, NULL_TREE,
>                         ERROR_MARK, -1);
>        return true;
>      }
> @@ -5066,7 +5230,7 @@ determine_iv_cost (struct ivopts_data *data, struc
>       or a const set.  */
>    if (cost_base.cost == 0)
>      cost_base.cost = COSTS_N_INSNS (1);
> -  cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
> +  cost_step = add_regs_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
>  
>    cost = cost_step + adjust_setup_cost (data, cost_base.cost);
>  
> @@ -5561,10 +5725,10 @@ iv_ca_new (struct ivopts_data *data)
>    nw->cands = BITMAP_ALLOC (NULL);
>    nw->n_cands = 0;
>    nw->n_regs = 0;
> -  nw->cand_use_cost = zero_cost;
> +  nw->cand_use_cost = no_cost;
>    nw->cand_cost = 0;
>    nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
> -  nw->cost = zero_cost;
> +  nw->cost = no_cost;
>    nw->used_inv_expr = XCNEWVEC (unsigned, data->inv_expr_id + 1);
>    nw->num_used_inv_expr = 0;
>  
> @@ -6638,6 +6802,8 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data
>    VEC_free (iv_use_p, heap, data->iv_uses);
>    VEC_free (iv_cand_p, heap, data->iv_candidates);
>    htab_delete (data->inv_expr_tab);
> +  htab_delete (mult_costs[0]);
> +  htab_delete (mult_costs[1]);
>  }
>  
>  /* Returns true if the loop body BODY includes any function calls.  */
> Index: gcc/tree-ssa-address.c
> ===================================================================
> --- gcc/tree-ssa-address.c	(revision 188839)
> +++ gcc/tree-ssa-address.c	(working copy)
> @@ -556,7 +556,7 @@ most_expensive_mult_to_index (tree type, struct me
>  	  || !multiplier_allowed_in_address_p (coef, TYPE_MODE (type), as))
>  	continue;
>  
> -      acost = multiply_by_cost (coef, address_mode, speed);
> +      acost = multiply_by_const_cost (coef, address_mode, speed);
>  
>        if (acost > best_mult_cost)
>  	{
> Index: gcc/tree-flow.h
> ===================================================================
> --- gcc/tree-flow.h	(revision 188839)
> +++ gcc/tree-flow.h	(working copy)
> @@ -810,7 +810,12 @@ bool expr_invariant_in_loop_p (struct loop *, tree
>  bool stmt_invariant_in_loop_p (struct loop *, gimple);
>  bool multiplier_allowed_in_address_p (HOST_WIDE_INT, enum machine_mode,
>  				      addr_space_t);
> -unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
> +unsigned multiply_by_const_cost (HOST_WIDE_INT, enum machine_mode, bool);
> +unsigned add_regs_cost (enum machine_mode, bool);
> +unsigned multiply_regs_cost (enum machine_mode, bool);
> +unsigned add_const_cost (enum machine_mode, bool);
> +unsigned extend_or_trunc_reg_cost (tree, tree, bool);
> +unsigned negate_reg_cost (enum machine_mode, bool);
>  bool may_be_nonaddressable_p (tree expr);
>  
>  /* In tree-ssa-threadupdate.c.  */
> 
> 
>
William J. Schmidt - June 22, 2012, 11:44 a.m.
On Fri, 2012-06-22 at 10:44 +0200, Richard Guenther wrote:
> On Thu, 21 Jun 2012, William J. Schmidt wrote:
> 
> > As promised, this breaks out the changes to the IVOPTS cost model and
> > the added function in double-int.c.  Please let me know if you would
> > rather see me attempt to consolidate the IVOPTS logic into expmed.c per
> > Richard H's suggestion.
> 
> If we start to use it from multiple places that definitely makes sense,
> but you can move the stuff as a followup.

OK, I'll put it on my list.

> 
> > I ran into a glitch with multiply_by_const_cost.  The original code
> > declared a static htab_t in the function and allocated it on demand.
> > When I tried adding a second one in the same manner, I ran into a
> > locking problem in the memory management library code during a call to
> > delete_htab.  The original implementation seemed a bit dicey to me
> > anyway, so I changed this to explicitly allocate and deallocate the hash
> > tables on (entry to/exit from) IVOPTS.
> 
> Huh.  That's weird and should not happen.  Still it makes sense to
> move this to a per-function cache given that its size is basically
> unbound.
> 
> Can you introduce a initialize_costs () / finalize_costs () function
> pair that allocates / frees the tables and sets a global flag that
> you can then assert in the functions using those tables?

Ok.

> 
> > +  if (speed)
> > +    speed = 1;
> 
> I suppose this is because bool is not bool when building with a
> C compiler?  It really looks weird and if such is necessary I'd
> prefer something like
> 
> > +add_regs_cost (enum machine_mode mode, bool speed)
> >  {
> > +  static unsigned costs[NUM_MACHINE_MODES][2];
> >    rtx seq;
> >    unsigned cost;
>      unsigned sidx = speed ? 0 : 1;
> >  
> > +  if (costs[mode][sidx])
> > +    return costs[mode][sidx];
> > +
> 
> instead.

I'm always paranoid about misuse of bools in C, but I suppose this is
overkill.  I'll just remove the code.

Thanks,
Bill

> 
> Otherwise the patch is ok.
> 
> Thanks,
> Richard.
> 
> > This reduces the scope of the hash table from a compilation unit to each
> > individual function.  If it's preferred to maintain compilation unit
> > scope, then the initialization/finalization of the htabs can be pushed
> > out to do_compile.  But I doubt it's worth that.
> > 
> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> > regressions.  Ok for trunk?
> > 
> > Thanks,
> > Bill
> > 
> > 
> > 2012-06-21  Bill Schmidt  <wschmidt@linux.ibm.com>
> > 
> > 	* double-int.c (double_int_multiple_of): New function.
> > 	* double-int.h (double_int_multiple_of): New decl.
> > 	* tree-ssa-loop-ivopts.c (add_cost, zero_cost): Remove undefs.
> > 	(mbc_entry_hash): New forward decl.
> > 	(mbc_entry_eq): Likewise.
> > 	(zero_cost): Change to no_cost.
> > 	(mult_costs): New static var.
> > 	(tree_ssa_iv_optimize_init): Initialize mult_costs.
> > 	(add_cost): Change to add_regs_cost; distinguish costs by speed.
> > 	(multiply_regs_cost): New function.
> > 	(add_const_cost): Likewise.
> > 	(extend_or_trunc_reg_cost): Likewise.
> > 	(negate_reg_cost): Likewise.
> > 	(multiply_by_cost): Change to multiply_by_const_cost; distinguish
> > 	costs by speed.
> > 	(get_address_cost): Change add_cost to add_regs_cost; change
> > 	multiply_by_cost to multiply_by_const_cost.
> > 	(force_expr_to_var_cost): Change zero_cost to no_cost; change
> > 	add_cost to add_regs_cost; change multiply_by_cost to
> > 	multiply_by_const_cost.
> > 	(split_cost): Change zero_cost to no_cost.
> > 	(ptr_difference_cost): Likewise.
> > 	(difference_cost): Change zero_cost to no_cost; change multiply_by_cost
> > 	to multiply_by_const_cost.
> > 	(get_computation_cost_at): Change add_cost to add_regs_cost; change
> > 	multiply_by_cost to multiply_by_const_cost.
> > 	(determine_use_iv_cost_generic): Change zero_cost to no_cost.
> > 	(determine_iv_cost): Change add_cost to add_regs_cost.
> > 	(iv_ca_new): Change zero_cost to no_cost.
> > 	(tree_ssa_iv_optimize_finalize): Release storage for mult_costs.
> > 	* tree-ssa-address.c (most_expensive_mult_to_index): Change
> > 	multiply_by_cost to multiply_by_const_cost.
> > 	* tree-flow.h (multiply_by_cost): Change to multiply_by_const_cost.
> > 	(add_regs_cost): New decl.
> > 	(multiply_regs_cost): Likewise.
> > 	(add_const_cost): Likewise.
> > 	(extend_or_trunc_reg_cost): Likewise.
> > 	(negate_reg_cost): Likewise.
> > 
> > 
> > Index: gcc/double-int.c
> > ===================================================================
> > --- gcc/double-int.c	(revision 188839)
> > +++ gcc/double-int.c	(working copy)
> > @@ -865,6 +865,26 @@ double_int_umod (double_int a, double_int b, unsig
> >    return double_int_mod (a, b, true, code);
> >  }
> >  
> > +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
> > +   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
> > +   unchanged.  */
> > +
> > +bool
> > +double_int_multiple_of (double_int product, double_int factor,
> > +			bool unsigned_p, double_int *multiple)
> > +{
> > +  double_int remainder;
> > +  double_int quotient = double_int_divmod (product, factor, unsigned_p,
> > +					   TRUNC_DIV_EXPR, &remainder);
> > +  if (double_int_zero_p (remainder))
> > +    {
> > +      *multiple = quotient;
> > +      return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> >  /* Set BITPOS bit in A.  */
> >  double_int
> >  double_int_setbit (double_int a, unsigned bitpos)
> > Index: gcc/double-int.h
> > ===================================================================
> > --- gcc/double-int.h	(revision 188839)
> > +++ gcc/double-int.h	(working copy)
> > @@ -150,6 +150,8 @@ double_int double_int_divmod (double_int, double_i
> >  double_int double_int_sdivmod (double_int, double_int, unsigned, double_int *);
> >  double_int double_int_udivmod (double_int, double_int, unsigned, double_int *);
> >  
> > +bool double_int_multiple_of (double_int, double_int, bool, double_int *);
> > +
> >  double_int double_int_setbit (double_int, unsigned);
> >  int double_int_ctz (double_int);
> >  
> > Index: gcc/tree-ssa-loop-ivopts.c
> > ===================================================================
> > --- gcc/tree-ssa-loop-ivopts.c	(revision 188839)
> > +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> > @@ -89,13 +89,11 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "target.h"
> >  #include "tree-inline.h"
> >  #include "tree-ssa-propagate.h"
> > -
> > -/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
> > - */
> >  #include "expmed.h"
> > -#undef add_cost
> > -#undef zero_cost
> >  
> > +static hashval_t mbc_entry_hash (const void *);
> > +static int mbc_entry_eq (const void*, const void *);
> > +
> >  /* FIXME: Expressions are expanded to RTL in this pass to determine the
> >     cost of different addressing modes.  This should be moved to a TBD
> >     interface between the GIMPLE and RTL worlds.  */
> > @@ -162,7 +160,7 @@ typedef struct
> >  			   complex expressions and addressing modes).  */
> >  } comp_cost;
> >  
> > -static const comp_cost zero_cost = {0, 0};
> > +static const comp_cost no_cost = {0, 0};
> >  static const comp_cost infinite_cost = {INFTY, INFTY};
> >  
> >  /* The candidate - cost pair.  */
> > @@ -386,6 +384,9 @@ struct iv_ca_delta
> >  
> >  static VEC(tree,heap) *decl_rtl_to_reset;
> >  
> > +/* Cached costs for multiplies by constants.  */
> > +static htab_t mult_costs[2];
> > +
> >  static comp_cost force_expr_to_var_cost (tree, bool);
> >  
> >  /* Number of uses recorded in DATA.  */
> > @@ -869,6 +870,9 @@ tree_ssa_iv_optimize_init (struct ivopts_data *dat
> >                                      htab_inv_expr_eq, free);
> >    data->inv_expr_id = 0;
> >    decl_rtl_to_reset = VEC_alloc (tree, heap, 20);
> > +
> > +  mult_costs[0] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
> > +  mult_costs[1] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
> >  }
> >  
> >  /* Returns a memory object to that EXPR points.  In case we are able to
> > @@ -3057,16 +3061,19 @@ adjust_setup_cost (struct ivopts_data *data, unsig
> >  
> >  /* Returns cost of addition in MODE.  */
> >  
> > -static unsigned
> > -add_cost (enum machine_mode mode, bool speed)
> > +unsigned
> > +add_regs_cost (enum machine_mode mode, bool speed)
> >  {
> > -  static unsigned costs[NUM_MACHINE_MODES];
> > +  static unsigned costs[NUM_MACHINE_MODES][2];
> >    rtx seq;
> >    unsigned cost;
> >  
> > -  if (costs[mode])
> > -    return costs[mode];
> > +  if (speed)
> > +    speed = 1;
> >  
> > +  if (costs[mode][speed])
> > +    return costs[mode][speed];
> > +
> >    start_sequence ();
> >    force_operand (gen_rtx_fmt_ee (PLUS, mode,
> >  				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
> > @@ -3079,7 +3086,7 @@ adjust_setup_cost (struct ivopts_data *data, unsig
> >    if (!cost)
> >      cost = 1;
> >  
> > -  costs[mode] = cost;
> > +  costs[mode][speed] = cost;
> >  
> >    if (dump_file && (dump_flags & TDF_DETAILS))
> >      fprintf (dump_file, "Addition in %s costs %d\n",
> > @@ -3087,6 +3094,160 @@ adjust_setup_cost (struct ivopts_data *data, unsig
> >    return cost;
> >  }
> >  
> > +/* Returns cost of multiplication in MODE.  */
> > +
> > +unsigned
> > +multiply_regs_cost (enum machine_mode mode, bool speed)
> > +{
> > +  static unsigned costs[NUM_MACHINE_MODES][2];
> > +  rtx seq;
> > +  unsigned cost;
> > +
> > +  if (speed)
> > +    speed = 1;
> > +
> > +  if (costs[mode][speed])
> > +    return costs[mode][speed];
> > +
> > +  start_sequence ();
> > +  force_operand (gen_rtx_fmt_ee (MULT, mode,
> > +				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
> > +				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
> > +		 NULL_RTX);
> > +  seq = get_insns ();
> > +  end_sequence ();
> > +
> > +  cost = seq_cost (seq, speed);
> > +  if (!cost)
> > +    cost = 1;
> > +
> > +  costs[mode][speed] = cost;
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    fprintf (dump_file, "Multiplication in %s costs %d\n",
> > +	     GET_MODE_NAME (mode), cost);
> > +  return cost;
> > +}
> > +
> > +/* Returns cost of addition with a constant in MODE.  */
> > +
> > +unsigned
> > +add_const_cost (enum machine_mode mode, bool speed)
> > +{
> > +  static unsigned costs[NUM_MACHINE_MODES][2];
> > +  rtx seq;
> > +  unsigned cost;
> > +
> > +  if (speed)
> > +    speed = 1;
> > +
> > +  if (costs[mode][speed])
> > +    return costs[mode][speed];
> > +
> > +  /* Arbitrarily generate insns for x + 2, as the exact constant
> > +     shouldn't matter.  */
> > +  start_sequence ();
> > +  force_operand (gen_rtx_fmt_ee (PLUS, mode,
> > +				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
> > +				 gen_int_mode (2, mode)),
> > +		 NULL_RTX);
> > +  seq = get_insns ();
> > +  end_sequence ();
> > +
> > +  cost = seq_cost (seq, speed);
> > +  if (!cost)
> > +    cost = 1;
> > +
> > +  costs[mode][speed] = cost;
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    fprintf (dump_file, "Addition to constant in %s costs %d\n",
> > +	     GET_MODE_NAME (mode), cost);
> > +  return cost;
> > +}
> > +
> > +/* Returns cost of extend or truncate in MODE.  */
> > +
> > +unsigned
> > +extend_or_trunc_reg_cost (tree type_to, tree type_from, bool speed)
> > +{
> > +  static unsigned costs[NUM_MACHINE_MODES][NUM_MACHINE_MODES][2];
> > +  rtx seq;
> > +  unsigned cost;
> > +  enum machine_mode mode_to = TYPE_MODE (type_to);
> > +  enum machine_mode mode_from = TYPE_MODE (type_from);
> > +  tree size_to = TYPE_SIZE (type_to);
> > +  tree size_from = TYPE_SIZE (type_from);
> > +  enum rtx_code code;
> > +
> > +  gcc_assert (TREE_CODE (size_to) == INTEGER_CST
> > +	      && TREE_CODE (size_from) == INTEGER_CST);
> > +
> > +  if (speed)
> > +    speed = 1;
> > +
> > +  if (costs[mode_to][mode_from][speed])
> > +    return costs[mode_to][mode_from][speed];
> > +
> > +  if (tree_int_cst_lt (size_to, size_from))
> > +    code = TRUNCATE;
> > +  else if (TYPE_UNSIGNED (type_to))
> > +    code = ZERO_EXTEND;
> > +  else
> > +    code = SIGN_EXTEND;
> > +
> > +  start_sequence ();
> > +  gen_rtx_fmt_e (code, mode_to,
> > +		 gen_raw_REG (mode_from, LAST_VIRTUAL_REGISTER + 1));
> > +  seq = get_insns ();
> > +  end_sequence ();
> > +
> > +  cost = seq_cost (seq, speed);
> > +  if (!cost)
> > +    cost = 1;
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    fprintf (dump_file, "Conversion from %s to %s costs %d\n",
> > +	     GET_MODE_NAME (mode_to), GET_MODE_NAME (mode_from), cost);
> > +
> > +  costs[mode_to][mode_from][speed] = cost;
> > +  return cost;
> > +}
> > +
> > +/* Returns cost of negation in MODE.  */
> > +
> > +unsigned
> > +negate_reg_cost (enum machine_mode mode, bool speed)
> > +{
> > +  static unsigned costs[NUM_MACHINE_MODES][2];
> > +  rtx seq;
> > +  unsigned cost;
> > +
> > +  if (speed)
> > +    speed = 1;
> > +
> > +  if (costs[mode][speed])
> > +    return costs[mode][speed];
> > +
> > +  start_sequence ();
> > +  force_operand (gen_rtx_fmt_e (NEG, mode,
> > +				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
> > +		 NULL_RTX);
> > +  seq = get_insns ();
> > +  end_sequence ();
> > +
> > +  cost = seq_cost (seq, speed);
> > +  if (!cost)
> > +    cost = 1;
> > +
> > +  costs[mode][speed] = cost;
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    fprintf (dump_file, "Negation in %s costs %d\n",
> > +	     GET_MODE_NAME (mode), cost);
> > +  return cost;
> > +}
> > +
> >  /* Entry in a hashtable of already known costs for multiplication.  */
> >  struct mbc_entry
> >  {
> > @@ -3120,19 +3281,20 @@ mbc_entry_eq (const void *entry1, const void *entr
> >  /* Returns cost of multiplication by constant CST in MODE.  */
> >  
> >  unsigned
> > -multiply_by_cost (HOST_WIDE_INT cst, enum machine_mode mode, bool speed)
> > +multiply_by_const_cost (HOST_WIDE_INT cst, enum machine_mode mode, bool speed)
> >  {
> > -  static htab_t costs;
> >    struct mbc_entry **cached, act;
> >    rtx seq;
> >    unsigned cost;
> >  
> > -  if (!costs)
> > -    costs = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
> > +  if (speed)
> > +    speed = 1;
> >  
> >    act.mode = mode;
> >    act.cst = cst;
> > -  cached = (struct mbc_entry **) htab_find_slot (costs, &act, INSERT);
> > +  cached = (struct mbc_entry **)
> > +    htab_find_slot (mult_costs[speed], &act, INSERT);
> > +    
> >    if (*cached)
> >      return (*cached)->cost;
> >  
> > @@ -3418,7 +3580,7 @@ get_address_cost (bool symbol_present, bool var_pr
> >  	 If VAR_PRESENT is true, try whether the mode with
> >  	 SYMBOL_PRESENT = false is cheaper even with cost of addition, and
> >  	 if this is the case, use it.  */
> > -      add_c = add_cost (address_mode, speed);
> > +      add_c = add_regs_cost (address_mode, speed);
> >        for (i = 0; i < 8; i++)
> >  	{
> >  	  var_p = i & 1;
> > @@ -3499,10 +3661,10 @@ get_address_cost (bool symbol_present, bool var_pr
> >  	     && multiplier_allowed_in_address_p (ratio, mem_mode, as));
> >  
> >    if (ratio != 1 && !ratio_p)
> > -    cost += multiply_by_cost (ratio, address_mode, speed);
> > +    cost += multiply_by_const_cost (ratio, address_mode, speed);
> >  
> >    if (s_offset && !offset_p && !symbol_present)
> > -    cost += add_cost (address_mode, speed);
> > +    cost += add_regs_cost (address_mode, speed);
> >  
> >    if (may_autoinc)
> >      *may_autoinc = autoinc;
> > @@ -3601,7 +3763,7 @@ force_expr_to_var_cost (tree expr, bool speed)
> >    STRIP_NOPS (expr);
> >  
> >    if (SSA_VAR_P (expr))
> > -    return zero_cost;
> > +    return no_cost;
> >  
> >    if (is_gimple_min_invariant (expr))
> >      {
> > @@ -3633,12 +3795,12 @@ force_expr_to_var_cost (tree expr, bool speed)
> >        STRIP_NOPS (op1);
> >  
> >        if (is_gimple_val (op0))
> > -	cost0 = zero_cost;
> > +	cost0 = no_cost;
> >        else
> >  	cost0 = force_expr_to_var_cost (op0, speed);
> >  
> >        if (is_gimple_val (op1))
> > -	cost1 = zero_cost;
> > +	cost1 = no_cost;
> >        else
> >  	cost1 = force_expr_to_var_cost (op1, speed);
> >  
> > @@ -3650,11 +3812,11 @@ force_expr_to_var_cost (tree expr, bool speed)
> >        op1 = NULL_TREE;
> >  
> >        if (is_gimple_val (op0))
> > -	cost0 = zero_cost;
> > +	cost0 = no_cost;
> >        else
> >  	cost0 = force_expr_to_var_cost (op0, speed);
> >  
> > -      cost1 = zero_cost;
> > +      cost1 = no_cost;
> >        break;
> >  
> >      default:
> > @@ -3669,7 +3831,7 @@ force_expr_to_var_cost (tree expr, bool speed)
> >      case PLUS_EXPR:
> >      case MINUS_EXPR:
> >      case NEGATE_EXPR:
> > -      cost = new_cost (add_cost (mode, speed), 0);
> > +      cost = new_cost (add_regs_cost (mode, speed), 0);
> >        if (TREE_CODE (expr) != NEGATE_EXPR)
> >          {
> >            tree mult = NULL_TREE;
> > @@ -3689,9 +3851,11 @@ force_expr_to_var_cost (tree expr, bool speed)
> >  
> >      case MULT_EXPR:
> >        if (cst_and_fits_in_hwi (op0))
> > -	cost = new_cost (multiply_by_cost (int_cst_value (op0), mode, speed), 0);
> > +	cost = new_cost (multiply_by_const_cost (int_cst_value (op0),
> > +						 mode, speed), 0);
> >        else if (cst_and_fits_in_hwi (op1))
> > -	cost = new_cost (multiply_by_cost (int_cst_value (op1), mode, speed), 0);
> > +	cost = new_cost (multiply_by_const_cost (int_cst_value (op1),
> > +						 mode, speed), 0);
> >        else
> >  	return new_cost (target_spill_cost [speed], 0);
> >        break;
> > @@ -3766,12 +3930,12 @@ split_address_cost (struct ivopts_data *data,
> >      {
> >        *symbol_present = true;
> >        *var_present = false;
> > -      return zero_cost;
> > +      return no_cost;
> >      }
> >  
> >    *symbol_present = false;
> >    *var_present = true;
> > -  return zero_cost;
> > +  return no_cost;
> >  }
> >  
> >  /* Estimates cost of expressing difference of addresses E1 - E2 as
> > @@ -3796,7 +3960,7 @@ ptr_difference_cost (struct ivopts_data *data,
> >        *offset += diff;
> >        *symbol_present = false;
> >        *var_present = false;
> > -      return zero_cost;
> > +      return no_cost;
> >      }
> >  
> >    if (integer_zerop (e2))
> > @@ -3846,7 +4010,7 @@ difference_cost (struct ivopts_data *data,
> >    if (operand_equal_p (e1, e2, 0))
> >      {
> >        *var_present = false;
> > -      return zero_cost;
> > +      return no_cost;
> >      }
> >  
> >    *var_present = true;
> > @@ -3857,7 +4021,7 @@ difference_cost (struct ivopts_data *data,
> >    if (integer_zerop (e1))
> >      {
> >        comp_cost cost = force_var_cost (data, e2, depends_on);
> > -      cost.cost += multiply_by_cost (-1, mode, data->speed);
> > +      cost.cost += multiply_by_const_cost (-1, mode, data->speed);
> >        return cost;
> >      }
> >  
> > @@ -4168,7 +4332,7 @@ get_computation_cost_at (struct ivopts_data *data,
> >  					 &symbol_present, &var_present,
> >  					 &offset, depends_on));
> >        cost.cost /= avg_loop_niter (data->current_loop);
> > -      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
> > +      cost.cost += add_regs_cost (TYPE_MODE (ctype), data->speed);
> >      }
> >  
> >    if (inv_expr_id)
> > @@ -4201,7 +4365,7 @@ get_computation_cost_at (struct ivopts_data *data,
> >    if (!symbol_present && !var_present && !offset)
> >      {
> >        if (ratio != 1)
> > -	cost.cost += multiply_by_cost (ratio, TYPE_MODE (ctype), speed);
> > +	cost.cost += multiply_by_const_cost (ratio, TYPE_MODE (ctype), speed);
> >        return cost;
> >      }
> >  
> > @@ -4209,18 +4373,18 @@ get_computation_cost_at (struct ivopts_data *data,
> >        are added once to the variable, if present.  */
> >    if (var_present && (symbol_present || offset))
> >      cost.cost += adjust_setup_cost (data,
> > -				    add_cost (TYPE_MODE (ctype), speed));
> > +				    add_regs_cost (TYPE_MODE (ctype), speed));
> >  
> >    /* Having offset does not affect runtime cost in case it is added to
> >       symbol, but it increases complexity.  */
> >    if (offset)
> >      cost.complexity++;
> >  
> > -  cost.cost += add_cost (TYPE_MODE (ctype), speed);
> > +  cost.cost += add_regs_cost (TYPE_MODE (ctype), speed);
> >  
> >    aratio = ratio > 0 ? ratio : -ratio;
> >    if (aratio != 1)
> > -    cost.cost += multiply_by_cost (aratio, TYPE_MODE (ctype), speed);
> > +    cost.cost += multiply_by_const_cost (aratio, TYPE_MODE (ctype), speed);
> >    return cost;
> >  
> >  fallback:
> > @@ -4277,7 +4441,7 @@ determine_use_iv_cost_generic (struct ivopts_data
> >    if (cand->pos == IP_ORIGINAL
> >        && cand->incremented_at == use->stmt)
> >      {
> > -      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE,
> > +      set_use_iv_cost (data, use, cand, no_cost, NULL, NULL_TREE,
> >                         ERROR_MARK, -1);
> >        return true;
> >      }
> > @@ -5066,7 +5230,7 @@ determine_iv_cost (struct ivopts_data *data, struc
> >       or a const set.  */
> >    if (cost_base.cost == 0)
> >      cost_base.cost = COSTS_N_INSNS (1);
> > -  cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
> > +  cost_step = add_regs_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
> >  
> >    cost = cost_step + adjust_setup_cost (data, cost_base.cost);
> >  
> > @@ -5561,10 +5725,10 @@ iv_ca_new (struct ivopts_data *data)
> >    nw->cands = BITMAP_ALLOC (NULL);
> >    nw->n_cands = 0;
> >    nw->n_regs = 0;
> > -  nw->cand_use_cost = zero_cost;
> > +  nw->cand_use_cost = no_cost;
> >    nw->cand_cost = 0;
> >    nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
> > -  nw->cost = zero_cost;
> > +  nw->cost = no_cost;
> >    nw->used_inv_expr = XCNEWVEC (unsigned, data->inv_expr_id + 1);
> >    nw->num_used_inv_expr = 0;
> >  
> > @@ -6638,6 +6802,8 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data
> >    VEC_free (iv_use_p, heap, data->iv_uses);
> >    VEC_free (iv_cand_p, heap, data->iv_candidates);
> >    htab_delete (data->inv_expr_tab);
> > +  htab_delete (mult_costs[0]);
> > +  htab_delete (mult_costs[1]);
> >  }
> >  
> >  /* Returns true if the loop body BODY includes any function calls.  */
> > Index: gcc/tree-ssa-address.c
> > ===================================================================
> > --- gcc/tree-ssa-address.c	(revision 188839)
> > +++ gcc/tree-ssa-address.c	(working copy)
> > @@ -556,7 +556,7 @@ most_expensive_mult_to_index (tree type, struct me
> >  	  || !multiplier_allowed_in_address_p (coef, TYPE_MODE (type), as))
> >  	continue;
> >  
> > -      acost = multiply_by_cost (coef, address_mode, speed);
> > +      acost = multiply_by_const_cost (coef, address_mode, speed);
> >  
> >        if (acost > best_mult_cost)
> >  	{
> > Index: gcc/tree-flow.h
> > ===================================================================
> > --- gcc/tree-flow.h	(revision 188839)
> > +++ gcc/tree-flow.h	(working copy)
> > @@ -810,7 +810,12 @@ bool expr_invariant_in_loop_p (struct loop *, tree
> >  bool stmt_invariant_in_loop_p (struct loop *, gimple);
> >  bool multiplier_allowed_in_address_p (HOST_WIDE_INT, enum machine_mode,
> >  				      addr_space_t);
> > -unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
> > +unsigned multiply_by_const_cost (HOST_WIDE_INT, enum machine_mode, bool);
> > +unsigned add_regs_cost (enum machine_mode, bool);
> > +unsigned multiply_regs_cost (enum machine_mode, bool);
> > +unsigned add_const_cost (enum machine_mode, bool);
> > +unsigned extend_or_trunc_reg_cost (tree, tree, bool);
> > +unsigned negate_reg_cost (enum machine_mode, bool);
> >  bool may_be_nonaddressable_p (tree expr);
> >  
> >  /* In tree-ssa-threadupdate.c.  */
> > 
> > 
> > 
>
William J. Schmidt - June 22, 2012, 11:21 p.m.
On Fri, 2012-06-22 at 10:44 +0200, Richard Guenther wrote:
> On Thu, 21 Jun 2012, William J. Schmidt wrote:

> > I ran into a glitch with multiply_by_const_cost.  The original code
> > declared a static htab_t in the function and allocated it on demand.
> > When I tried adding a second one in the same manner, I ran into a
> > locking problem in the memory management library code during a call to
> > delete_htab.  The original implementation seemed a bit dicey to me
> > anyway, so I changed this to explicitly allocate and deallocate the hash
> > tables on (entry to/exit from) IVOPTS.
> 
> Huh.  That's weird and should not happen.  Still it makes sense to
> move this to a per-function cache given that its size is basically
> unbound.
> 

Hm, this appears not to be related to my changes.  I ran into the same
issue when bootstrapping some other change without any of the IVOPTS
changes committed.  In both cases the stuck lock occurred when compiling
tree-vect-stmts.c.  I'll try to debug this when I get some time, unless
somebody else figures it out sooner.

Bill

Patch

Index: gcc/double-int.c
===================================================================
--- gcc/double-int.c	(revision 188839)
+++ gcc/double-int.c	(working copy)
@@ -865,6 +865,26 @@  double_int_umod (double_int a, double_int b, unsig
   return double_int_mod (a, b, true, code);
 }
 
+/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
+   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
+   unchanged.  */
+
+bool
+double_int_multiple_of (double_int product, double_int factor,
+			bool unsigned_p, double_int *multiple)
+{
+  double_int remainder;
+  double_int quotient = double_int_divmod (product, factor, unsigned_p,
+					   TRUNC_DIV_EXPR, &remainder);
+  if (double_int_zero_p (remainder))
+    {
+      *multiple = quotient;
+      return true;
+    }
+
+  return false;
+}
+
 /* Set BITPOS bit in A.  */
 double_int
 double_int_setbit (double_int a, unsigned bitpos)
Index: gcc/double-int.h
===================================================================
--- gcc/double-int.h	(revision 188839)
+++ gcc/double-int.h	(working copy)
@@ -150,6 +150,8 @@  double_int double_int_divmod (double_int, double_i
 double_int double_int_sdivmod (double_int, double_int, unsigned, double_int *);
 double_int double_int_udivmod (double_int, double_int, unsigned, double_int *);
 
+bool double_int_multiple_of (double_int, double_int, bool, double_int *);
+
 double_int double_int_setbit (double_int, unsigned);
 int double_int_ctz (double_int);
 
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 188839)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -89,13 +89,11 @@  along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "tree-inline.h"
 #include "tree-ssa-propagate.h"
-
-/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
- */
 #include "expmed.h"
-#undef add_cost
-#undef zero_cost
 
+static hashval_t mbc_entry_hash (const void *);
+static int mbc_entry_eq (const void*, const void *);
+
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
    interface between the GIMPLE and RTL worlds.  */
@@ -162,7 +160,7 @@  typedef struct
 			   complex expressions and addressing modes).  */
 } comp_cost;
 
-static const comp_cost zero_cost = {0, 0};
+static const comp_cost no_cost = {0, 0};
 static const comp_cost infinite_cost = {INFTY, INFTY};
 
 /* The candidate - cost pair.  */
@@ -386,6 +384,9 @@  struct iv_ca_delta
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+/* Cached costs for multiplies by constants.  */
+static htab_t mult_costs[2];
+
 static comp_cost force_expr_to_var_cost (tree, bool);
 
 /* Number of uses recorded in DATA.  */
@@ -869,6 +870,9 @@  tree_ssa_iv_optimize_init (struct ivopts_data *dat
                                     htab_inv_expr_eq, free);
   data->inv_expr_id = 0;
   decl_rtl_to_reset = VEC_alloc (tree, heap, 20);
+
+  mult_costs[0] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
+  mult_costs[1] = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
 }
 
 /* Returns a memory object to that EXPR points.  In case we are able to
@@ -3057,16 +3061,19 @@  adjust_setup_cost (struct ivopts_data *data, unsig
 
 /* Returns cost of addition in MODE.  */
 
-static unsigned
-add_cost (enum machine_mode mode, bool speed)
+unsigned
+add_regs_cost (enum machine_mode mode, bool speed)
 {
-  static unsigned costs[NUM_MACHINE_MODES];
+  static unsigned costs[NUM_MACHINE_MODES][2];
   rtx seq;
   unsigned cost;
 
-  if (costs[mode])
-    return costs[mode];
+  if (speed)
+    speed = 1;
 
+  if (costs[mode][speed])
+    return costs[mode][speed];
+
   start_sequence ();
   force_operand (gen_rtx_fmt_ee (PLUS, mode,
 				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
@@ -3079,7 +3086,7 @@  adjust_setup_cost (struct ivopts_data *data, unsig
   if (!cost)
     cost = 1;
 
-  costs[mode] = cost;
+  costs[mode][speed] = cost;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "Addition in %s costs %d\n",
@@ -3087,6 +3094,160 @@  adjust_setup_cost (struct ivopts_data *data, unsig
   return cost;
 }
 
+/* Returns cost of multiplication in MODE.  */
+
+unsigned
+multiply_regs_cost (enum machine_mode mode, bool speed)
+{
+  static unsigned costs[NUM_MACHINE_MODES][2];
+  rtx seq;
+  unsigned cost;
+
+  if (speed)
+    speed = 1;
+
+  if (costs[mode][speed])
+    return costs[mode][speed];
+
+  start_sequence ();
+  force_operand (gen_rtx_fmt_ee (MULT, mode,
+				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
+		 NULL_RTX);
+  seq = get_insns ();
+  end_sequence ();
+
+  cost = seq_cost (seq, speed);
+  if (!cost)
+    cost = 1;
+
+  costs[mode][speed] = cost;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Multiplication in %s costs %d\n",
+	     GET_MODE_NAME (mode), cost);
+  return cost;
+}
+
+/* Returns cost of addition with a constant in MODE.  */
+
+unsigned
+add_const_cost (enum machine_mode mode, bool speed)
+{
+  static unsigned costs[NUM_MACHINE_MODES][2];
+  rtx seq;
+  unsigned cost;
+
+  if (speed)
+    speed = 1;
+
+  if (costs[mode][speed])
+    return costs[mode][speed];
+
+  /* Arbitrarily generate insns for x + 2, as the exact constant
+     shouldn't matter.  */
+  start_sequence ();
+  force_operand (gen_rtx_fmt_ee (PLUS, mode,
+				 gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+				 gen_int_mode (2, mode)),
+		 NULL_RTX);
+  seq = get_insns ();
+  end_sequence ();
+
+  cost = seq_cost (seq, speed);
+  if (!cost)
+    cost = 1;
+
+  costs[mode][speed] = cost;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Addition to constant in %s costs %d\n",
+	     GET_MODE_NAME (mode), cost);
+  return cost;
+}
+
+/* Returns cost of extend or truncate in MODE.  */
+
+unsigned
+extend_or_trunc_reg_cost (tree type_to, tree type_from, bool speed)
+{
+  static unsigned costs[NUM_MACHINE_MODES][NUM_MACHINE_MODES][2];
+  rtx seq;
+  unsigned cost;
+  enum machine_mode mode_to = TYPE_MODE (type_to);
+  enum machine_mode mode_from = TYPE_MODE (type_from);
+  tree size_to = TYPE_SIZE (type_to);
+  tree size_from = TYPE_SIZE (type_from);
+  enum rtx_code code;
+
+  gcc_assert (TREE_CODE (size_to) == INTEGER_CST
+	      && TREE_CODE (size_from) == INTEGER_CST);
+
+  if (speed)
+    speed = 1;
+
+  if (costs[mode_to][mode_from][speed])
+    return costs[mode_to][mode_from][speed];
+
+  if (tree_int_cst_lt (size_to, size_from))
+    code = TRUNCATE;
+  else if (TYPE_UNSIGNED (type_to))
+    code = ZERO_EXTEND;
+  else
+    code = SIGN_EXTEND;
+
+  start_sequence ();
+  gen_rtx_fmt_e (code, mode_to,
+		 gen_raw_REG (mode_from, LAST_VIRTUAL_REGISTER + 1));
+  seq = get_insns ();
+  end_sequence ();
+
+  cost = seq_cost (seq, speed);
+  if (!cost)
+    cost = 1;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Conversion from %s to %s costs %d\n",
+	     GET_MODE_NAME (mode_to), GET_MODE_NAME (mode_from), cost);
+
+  costs[mode_to][mode_from][speed] = cost;
+  return cost;
+}
+
+/* Returns cost of negation in MODE.  */
+
+unsigned
+negate_reg_cost (enum machine_mode mode, bool speed)
+{
+  static unsigned costs[NUM_MACHINE_MODES][2];
+  rtx seq;
+  unsigned cost;
+
+  if (speed)
+    speed = 1;
+
+  if (costs[mode][speed])
+    return costs[mode][speed];
+
+  start_sequence ();
+  force_operand (gen_rtx_fmt_e (NEG, mode,
+				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
+		 NULL_RTX);
+  seq = get_insns ();
+  end_sequence ();
+
+  cost = seq_cost (seq, speed);
+  if (!cost)
+    cost = 1;
+
+  costs[mode][speed] = cost;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Negation in %s costs %d\n",
+	     GET_MODE_NAME (mode), cost);
+  return cost;
+}
+
 /* Entry in a hashtable of already known costs for multiplication.  */
 struct mbc_entry
 {
@@ -3120,19 +3281,20 @@  mbc_entry_eq (const void *entry1, const void *entr
 /* Returns cost of multiplication by constant CST in MODE.  */
 
 unsigned
-multiply_by_cost (HOST_WIDE_INT cst, enum machine_mode mode, bool speed)
+multiply_by_const_cost (HOST_WIDE_INT cst, enum machine_mode mode, bool speed)
 {
-  static htab_t costs;
   struct mbc_entry **cached, act;
   rtx seq;
   unsigned cost;
 
-  if (!costs)
-    costs = htab_create (100, mbc_entry_hash, mbc_entry_eq, free);
+  if (speed)
+    speed = 1;
 
   act.mode = mode;
   act.cst = cst;
-  cached = (struct mbc_entry **) htab_find_slot (costs, &act, INSERT);
+  cached = (struct mbc_entry **)
+    htab_find_slot (mult_costs[speed], &act, INSERT);
+    
   if (*cached)
     return (*cached)->cost;
 
@@ -3418,7 +3580,7 @@  get_address_cost (bool symbol_present, bool var_pr
 	 If VAR_PRESENT is true, try whether the mode with
 	 SYMBOL_PRESENT = false is cheaper even with cost of addition, and
 	 if this is the case, use it.  */
-      add_c = add_cost (address_mode, speed);
+      add_c = add_regs_cost (address_mode, speed);
       for (i = 0; i < 8; i++)
 	{
 	  var_p = i & 1;
@@ -3499,10 +3661,10 @@  get_address_cost (bool symbol_present, bool var_pr
 	     && multiplier_allowed_in_address_p (ratio, mem_mode, as));
 
   if (ratio != 1 && !ratio_p)
-    cost += multiply_by_cost (ratio, address_mode, speed);
+    cost += multiply_by_const_cost (ratio, address_mode, speed);
 
   if (s_offset && !offset_p && !symbol_present)
-    cost += add_cost (address_mode, speed);
+    cost += add_regs_cost (address_mode, speed);
 
   if (may_autoinc)
     *may_autoinc = autoinc;
@@ -3601,7 +3763,7 @@  force_expr_to_var_cost (tree expr, bool speed)
   STRIP_NOPS (expr);
 
   if (SSA_VAR_P (expr))
-    return zero_cost;
+    return no_cost;
 
   if (is_gimple_min_invariant (expr))
     {
@@ -3633,12 +3795,12 @@  force_expr_to_var_cost (tree expr, bool speed)
       STRIP_NOPS (op1);
 
       if (is_gimple_val (op0))
-	cost0 = zero_cost;
+	cost0 = no_cost;
       else
 	cost0 = force_expr_to_var_cost (op0, speed);
 
       if (is_gimple_val (op1))
-	cost1 = zero_cost;
+	cost1 = no_cost;
       else
 	cost1 = force_expr_to_var_cost (op1, speed);
 
@@ -3650,11 +3812,11 @@  force_expr_to_var_cost (tree expr, bool speed)
       op1 = NULL_TREE;
 
       if (is_gimple_val (op0))
-	cost0 = zero_cost;
+	cost0 = no_cost;
       else
 	cost0 = force_expr_to_var_cost (op0, speed);
 
-      cost1 = zero_cost;
+      cost1 = no_cost;
       break;
 
     default:
@@ -3669,7 +3831,7 @@  force_expr_to_var_cost (tree expr, bool speed)
     case PLUS_EXPR:
     case MINUS_EXPR:
     case NEGATE_EXPR:
-      cost = new_cost (add_cost (mode, speed), 0);
+      cost = new_cost (add_regs_cost (mode, speed), 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
         {
           tree mult = NULL_TREE;
@@ -3689,9 +3851,11 @@  force_expr_to_var_cost (tree expr, bool speed)
 
     case MULT_EXPR:
       if (cst_and_fits_in_hwi (op0))
-	cost = new_cost (multiply_by_cost (int_cst_value (op0), mode, speed), 0);
+	cost = new_cost (multiply_by_const_cost (int_cst_value (op0),
+						 mode, speed), 0);
       else if (cst_and_fits_in_hwi (op1))
-	cost = new_cost (multiply_by_cost (int_cst_value (op1), mode, speed), 0);
+	cost = new_cost (multiply_by_const_cost (int_cst_value (op1),
+						 mode, speed), 0);
       else
 	return new_cost (target_spill_cost [speed], 0);
       break;
@@ -3766,12 +3930,12 @@  split_address_cost (struct ivopts_data *data,
     {
       *symbol_present = true;
       *var_present = false;
-      return zero_cost;
+      return no_cost;
     }
 
   *symbol_present = false;
   *var_present = true;
-  return zero_cost;
+  return no_cost;
 }
 
 /* Estimates cost of expressing difference of addresses E1 - E2 as
@@ -3796,7 +3960,7 @@  ptr_difference_cost (struct ivopts_data *data,
       *offset += diff;
       *symbol_present = false;
       *var_present = false;
-      return zero_cost;
+      return no_cost;
     }
 
   if (integer_zerop (e2))
@@ -3846,7 +4010,7 @@  difference_cost (struct ivopts_data *data,
   if (operand_equal_p (e1, e2, 0))
     {
       *var_present = false;
-      return zero_cost;
+      return no_cost;
     }
 
   *var_present = true;
@@ -3857,7 +4021,7 @@  difference_cost (struct ivopts_data *data,
   if (integer_zerop (e1))
     {
       comp_cost cost = force_var_cost (data, e2, depends_on);
-      cost.cost += multiply_by_cost (-1, mode, data->speed);
+      cost.cost += multiply_by_const_cost (-1, mode, data->speed);
       return cost;
     }
 
@@ -4168,7 +4332,7 @@  get_computation_cost_at (struct ivopts_data *data,
 					 &symbol_present, &var_present,
 					 &offset, depends_on));
       cost.cost /= avg_loop_niter (data->current_loop);
-      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
+      cost.cost += add_regs_cost (TYPE_MODE (ctype), data->speed);
     }
 
   if (inv_expr_id)
@@ -4201,7 +4365,7 @@  get_computation_cost_at (struct ivopts_data *data,
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
-	cost.cost += multiply_by_cost (ratio, TYPE_MODE (ctype), speed);
+	cost.cost += multiply_by_const_cost (ratio, TYPE_MODE (ctype), speed);
       return cost;
     }
 
@@ -4209,18 +4373,18 @@  get_computation_cost_at (struct ivopts_data *data,
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += adjust_setup_cost (data,
-				    add_cost (TYPE_MODE (ctype), speed));
+				    add_regs_cost (TYPE_MODE (ctype), speed));
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
   if (offset)
     cost.complexity++;
 
-  cost.cost += add_cost (TYPE_MODE (ctype), speed);
+  cost.cost += add_regs_cost (TYPE_MODE (ctype), speed);
 
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
-    cost.cost += multiply_by_cost (aratio, TYPE_MODE (ctype), speed);
+    cost.cost += multiply_by_const_cost (aratio, TYPE_MODE (ctype), speed);
   return cost;
 
 fallback:
@@ -4277,7 +4441,7 @@  determine_use_iv_cost_generic (struct ivopts_data
   if (cand->pos == IP_ORIGINAL
       && cand->incremented_at == use->stmt)
     {
-      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE,
+      set_use_iv_cost (data, use, cand, no_cost, NULL, NULL_TREE,
                        ERROR_MARK, -1);
       return true;
     }
@@ -5066,7 +5230,7 @@  determine_iv_cost (struct ivopts_data *data, struc
      or a const set.  */
   if (cost_base.cost == 0)
     cost_base.cost = COSTS_N_INSNS (1);
-  cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
+  cost_step = add_regs_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
 
@@ -5561,10 +5725,10 @@  iv_ca_new (struct ivopts_data *data)
   nw->cands = BITMAP_ALLOC (NULL);
   nw->n_cands = 0;
   nw->n_regs = 0;
-  nw->cand_use_cost = zero_cost;
+  nw->cand_use_cost = no_cost;
   nw->cand_cost = 0;
   nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
-  nw->cost = zero_cost;
+  nw->cost = no_cost;
   nw->used_inv_expr = XCNEWVEC (unsigned, data->inv_expr_id + 1);
   nw->num_used_inv_expr = 0;
 
@@ -6638,6 +6802,8 @@  tree_ssa_iv_optimize_finalize (struct ivopts_data
   VEC_free (iv_use_p, heap, data->iv_uses);
   VEC_free (iv_cand_p, heap, data->iv_candidates);
   htab_delete (data->inv_expr_tab);
+  htab_delete (mult_costs[0]);
+  htab_delete (mult_costs[1]);
 }
 
 /* Returns true if the loop body BODY includes any function calls.  */
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 188839)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -556,7 +556,7 @@  most_expensive_mult_to_index (tree type, struct me
 	  || !multiplier_allowed_in_address_p (coef, TYPE_MODE (type), as))
 	continue;
 
-      acost = multiply_by_cost (coef, address_mode, speed);
+      acost = multiply_by_const_cost (coef, address_mode, speed);
 
       if (acost > best_mult_cost)
 	{
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 188839)
+++ gcc/tree-flow.h	(working copy)
@@ -810,7 +810,12 @@  bool expr_invariant_in_loop_p (struct loop *, tree
 bool stmt_invariant_in_loop_p (struct loop *, gimple);
 bool multiplier_allowed_in_address_p (HOST_WIDE_INT, enum machine_mode,
 				      addr_space_t);
-unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
+unsigned multiply_by_const_cost (HOST_WIDE_INT, enum machine_mode, bool);
+unsigned add_regs_cost (enum machine_mode, bool);
+unsigned multiply_regs_cost (enum machine_mode, bool);
+unsigned add_const_cost (enum machine_mode, bool);
+unsigned extend_or_trunc_reg_cost (tree, tree, bool);
+unsigned negate_reg_cost (enum machine_mode, bool);
 bool may_be_nonaddressable_p (tree expr);
 
 /* In tree-ssa-threadupdate.c.  */