Patchwork [RFC] New target interface for vectorizer cost model

login
register
mail settings
Submitter William J. Schmidt
Date July 6, 2012, 2:03 a.m.
Message ID <1341540219.10579.7.camel@gnopaine>
Download mbox | patch
Permalink /patch/169305/
State New
Headers show

Comments

William J. Schmidt - July 6, 2012, 2:03 a.m.
On Wed, 2012-07-04 at 10:49 +0200, Richard Guenther wrote:
> On Tue, 3 Jul 2012, William J. Schmidt wrote:
> 
> > Hi Richard,
> > 
> > Here's a revision incorporating changes addressing your comments.  As
> > before it passes bootstrap and regression testing on powerpc64-linux-gnu
> > and compiles SPEC cpu2000 and cpu2006 with identical cost model
> > results.  
> > 
> > Before committing the patch I would remove the two gcc_asserts that
> > verify the cost models match.  I think a follow-up patch should then fix
> 
> Will you also remove the then "dead" code computing the old cost?  I think
> it's odd to have a state committed where we compute both but only
> use one ...
> 
> > the costs that appear to be incorrectly not counted by the old model (by
> > un-commenting-out the two chunks of code identified in the patch).  I'd
> > want to verify this doesn't cause any bad changes of behavior, since it
> > could result in fewer vectorized loops.
> > 
> > Ok for trunk?
> 
> ... so I'd say yes, ok for trunk, but please wait until you have
> figured out that the followup "fixing" the existing bugs by commenting
> out the code works and that another followup removing the old cost
> stuff works (which I am confident in that both will work).
> 
> Which means we'd eventually have a single commit doing these three
> things (or three adjacent commits).
> 
> Thanks,
> Richard.

FYI, here's the combined patch with all of the above.  I've verified no
adverse effects on CPU2000; CPU2006 runs are in progress.  Provided
those go well, I'll plan to commit this unless there are objections.
I'll wait until after Cauldron so I have attention to spare in case of
any fallout.

A couple of quick notes:

 * I added an unsigned return value to add_stmt_cost and
record_stmt_cost.  These now return a tentative value of the cost for
the statement(s) being added, which is needed in debug statements,
estimates for peeling alternatives, etc.

 * There are several warts in tree-vectorizer.h like:

      struct
      {
        int outside_of_loop;
      } cost;

...which look foolish for now.  But they'll be going away soon enough
when these get converted into prolog/epilog cost estimates.

Otherwise changes are straightforward and as expected.

Thanks,
Bill


2012-07-06  Bill Schmidt  <wschmidt@linux.ibm.com>

	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): New hook.
	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	* targhooks.c (default_init_cost): New function.
	(default_add_stmt_cost): Likewise.
	(default_finish_cost): Likewise.
	(default_destroy_cost_data): Likewise.
	* targhooks.h (default_init_cost): New decl.
	(default_add_stmt_cost): Likewise.
	(default_finish_cost): Likewise.
	(default_destroy_cost_data): Likewise.
	* target.def (init_cost): New DEFHOOK.
	(add_stmt_cost): Likewise.
	(finish_cost): Likewise.
	(destroy_cost_data): Likewise.
	* target.h (struct _stmt_vec_info): New extern decl.
	(stmt_vectype): Likewise.
	(stmt_in_inner_loop_p): Likewise.
	* tree-vectorizer.h (stmt_info_for_cost): New struct/typedef.
	(stmt_vector_for_cost): New VEC/typedef.
	(add_stmt_info_to_vec): New function.
	(struct _slp_tree): Remove cost.inside_of_loop field.
	(struct _slp_instance): Remove cost.inside_of_loop field; add
	stmt_cost_vec field.
	(SLP_INSTANCE_INSIDE_OF_LOOP_COST): Remove macro.
	(SLP_INSTANCE_STMT_COST_VEC): New accessor macro.
	(SLP_TREE_INSIDE_OF_LOOP_COST): Remove macro.
	(struct _vect_peel_extended_info): Add stmt_cost_vec field.
	(struct _loop_vec_info): Add target_cost_data field.
	(LOOP_VINFO_TARGET_COST_DATA): New accessor macro.
	(struct _bb_vec_info): Add target_cost_data field.
	(BB_VINFO_TARGET_COST_DATA): New accessor macro.
	(struct _stmt_vec_info): Remove cost.inside_of_loop field.
	(STMT_VINFO_INSIDE_OF_LOOP_COST): Remove macro.
	(stmt_vinfo_set_inside_of_loop_cost): Remove function.
	(init_cost): New function.
	(add_stmt_cost): Likewise.
	(finish_cost): Likewise.
	(destroy_cost_data): Likewise.
	(vect_model_simple_cost): Change parameter list.
	(vect_model_store_cost): Likewise.
	(vect_model_load_cost): Likewise.
	(record_stmt_cost): New extern decl.
	(vect_get_load_cost): Change parameter list.
	(vect_get_store_cost): Likewise.
	* tree-vect-loop.c (new_loop_vec_info): Call init_cost.
	(destroy_loop_vec_info): Call destroy_cost_data.
	(vect_estimate_min_profitable_iters): Remove old calculation of
	inside costs; call finish_cost instead.
	(vect_model_reduction_cost): Call add_stmt_cost instead of old
	inside-costs calculation.
	(vect_model_induction_cost): Likewise.
	* tree-vect-data-refs.c (vect_get_data_access_cost): Change to
	return a stmt_vector_for_cost; modify calls to vect_get_load_cost
	and vect_get_store_cost to obtain the value to return.
	(vect_peeling_hash_get_lowest_cost): Obtain a stmt_cost_vec from
	vect_get_data_access_cost and store it in the minimum peeling
	structure.
	(vect_peeling_hash_choose_best_peeling): Change the parameter list
	to add a (stmt_vector_for_cost *) output parameter, and set its value.
	(vect_enhance_data_refs_alignment): Ignore the new return value from
	calls to vect_get_data_access_cost; obtain stmt_cost_vec from
	vect_peeling_hash_choose_best_peeling and pass its contents to the
	target cost model.
	* tree-vect-stmts.c (stmt_vectype): New function.
	(stmt_in_inner_loop_p): Likewise.
	(record_stmt_cost): Likewise.
	(vect_model_simple_cost): Add stmt_cost_vec parameter; call
	record_stmt_cost instead of old calculation; don't call
	stmt_vinfo_set_inside_of_loop_cost.
	(vect_model_promotion_demotion_cost): Call add_stmt_cost instead of
	old calculation; don't call stmt_vinfo_set_inside_of_loop_cost.
	(vect_model_store_cost): Add stmt_cost_vec parameter; call
	record_stmt_cost instead of old calculation; add stmt_cost_vec
	parameter to vect_get_store_cost call; don't call
	stmt_vinfo_set_inside_of_loop_cost.
	(vect_get_store_cost): Add stmt_cost_vec parameter; call
	record_stmt_cost instead of old calculation.
	(vect_model_load_cost): Add stmt_cost_vec parameter; call
	record_stmt_cost instead of old calculation; add stmt_cost_vec
	parameter to vect_get_load_cost call; don't call
	stmt_vinfo_set_inside_of_loop_cost.
	(vect_get_load_cost): Add stmt_cost_vec parameter; call
	record_stmt_cost instead of old calculation.
	(vectorizable_call): Add NULL parameter to vect_model_simple_cost call.
	(vectorizable_conversion): Likewise.
	(vectorizable_assignment): Likewise.
	(vectorizable_shift): Likewise.
	(vectorizable_operation): Likewise.
	(vectorizable_store): Add NULL parameter to vect_model_store_cost call.
	(vectorizable_load): Add NULL parameter to vect_model_load_cost call.
	(new_stmt_vec_info): Don't set STMT_VINFO_INSIDE_OF_LOOP_COST.
	* config/spu/spu.c (TARGET_VECTORIZE_INIT_COST): New macro def.
	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	(spu_init_cost): New function.
	(spu_add_stmt_cost): Likewise.
	(spu_finish_cost): Likewise.
	(spu_destroy_cost_data): Likewise.
	* config/i386/i386.c (ix86_init_cost): New function.
	(ix86_add_stmt_cost): Likewise.
	(ix86_finish_cost): Likewise.
	(ix86_destroy_cost_data): Likewise.
	(TARGET_VECTORIZE_INIT_COST): New macro def.
	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	* config/rs6000/rs6000.c (TARGET_VECTORIZE_INIT_COST): New macro def.
	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	(rs6000_init_cost): New function.
	(rs6000_add_stmt_cost): Likewise.
	(rs6000_finish_cost): Likewise.
	(rs6000_destroy_cost_data): Likewise.
	* tree-vect-slp.c (vect_free_slp_instance): Free stmt_cost_vec.
	(vect_create_new_slp_node): Don't set SLP_TREE_INSIDE_OF_LOOP_COST.
	(vect_get_and_check_slp_defs): Add stmt_cost_vec parameter; add
	stmt_cost_vec parameter to vect_model_store_cost and
	vect_model_simple_cost calls.
	(vect_build_slp_tree): Remove inside_cost parameter; add stmt_cost_vec
	parameter; add stmt_cost_vec parameter to vect_get_and_check_slp_defs,
	vect_model_load_cost, and recursive vect_build_slp_tree calls; prevent
	calculating cost more than once for loads; remove inside_cost
	parameter from recursive vect_build_slp_tree calls; call
	record_stmt_cost instead of old calculation.
	(vect_analyze_slp_instance): Allocate stmt_cost_vec and save it with
	the instance; free it on premature exit; remove inside_cost parameter
	from vect_build_slp_tree call; add stmt_cost_vec parameter to
	vect_build_slp_tree call; don't set SLP_INSTANCE_INSIDE_OF_LOOP_COST.
	(new_bb_vec_info): Call init_cost.
	(destroy_bb_vec_info): Call destroy_cost_data.
	(vect_bb_vectorization_profitable_p): Call add_stmt_cost for each
	statement recorded with an SLP instance; call finish_cost instead of
	the old calculation.
	(vect_update_slp_costs_according_to_vf): Record statement costs from
	SLP instances, multiplying by the appropriate number of copies; don't
	update SLP_INSTANCE_INSIDE_OF_LOOP_COST.
Richard Guenther - July 6, 2012, 8:27 a.m.
On Thu, 5 Jul 2012, William J. Schmidt wrote:

> On Wed, 2012-07-04 at 10:49 +0200, Richard Guenther wrote:
> > On Tue, 3 Jul 2012, William J. Schmidt wrote:
> > 
> > > Hi Richard,
> > > 
> > > Here's a revision incorporating changes addressing your comments.  As
> > > before it passes bootstrap and regression testing on powerpc64-linux-gnu
> > > and compiles SPEC cpu2000 and cpu2006 with identical cost model
> > > results.  
> > > 
> > > Before committing the patch I would remove the two gcc_asserts that
> > > verify the cost models match.  I think a follow-up patch should then fix
> > 
> > Will you also remove the then "dead" code computing the old cost?  I think
> > it's odd to have a state committed where we compute both but only
> > use one ...
> > 
> > > the costs that appear to be incorrectly not counted by the old model (by
> > > un-commenting-out the two chunks of code identified in the patch).  I'd
> > > want to verify this doesn't cause any bad changes of behavior, since it
> > > could result in fewer vectorized loops.
> > > 
> > > Ok for trunk?
> > 
> > ... so I'd say yes, ok for trunk, but please wait until you have
> > figured out that the followup "fixing" the existing bugs by commenting
> > out the code works and that another followup removing the old cost
> > stuff works (which I am confident in that both will work).
> > 
> > Which means we'd eventually have a single commit doing these three
> > things (or three adjacent commits).
> > 
> > Thanks,
> > Richard.
> 
> FYI, here's the combined patch with all of the above.  I've verified no
> adverse effects on CPU2000; CPU2006 runs are in progress.  Provided
> those go well, I'll plan to commit this unless there are objections.
> I'll wait until after Cauldron so I have attention to spare in case of
> any fallout.
> 
> A couple of quick notes:
> 
>  * I added an unsigned return value to add_stmt_cost and
> record_stmt_cost.  These now return a tentative value of the cost for
> the statement(s) being added, which is needed in debug statements,
> estimates for peeling alternatives, etc.
> 
>  * There are several warts in tree-vectorizer.h like:
> 
>       struct
>       {
>         int outside_of_loop;
>       } cost;
> 
> ...which look foolish for now.  But they'll be going away soon enough
> when these get converted into prolog/epilog cost estimates.
> 
> Otherwise changes are straightforward and as expected.

Thanks!
Richard.

> Thanks,
> Bill
> 
> 
> 2012-07-06  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> 	* doc/tm.texi: Regenerate.
> 	* doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): New hook.
> 	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
> 	(TARGET_VECTORIZE_FINISH_COST): Likewise.
> 	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
> 	* targhooks.c (default_init_cost): New function.
> 	(default_add_stmt_cost): Likewise.
> 	(default_finish_cost): Likewise.
> 	(default_destroy_cost_data): Likewise.
> 	* targhooks.h (default_init_cost): New decl.
> 	(default_add_stmt_cost): Likewise.
> 	(default_finish_cost): Likewise.
> 	(default_destroy_cost_data): Likewise.
> 	* target.def (init_cost): New DEFHOOK.
> 	(add_stmt_cost): Likewise.
> 	(finish_cost): Likewise.
> 	(destroy_cost_data): Likewise.
> 	* target.h (struct _stmt_vec_info): New extern decl.
> 	(stmt_vectype): Likewise.
> 	(stmt_in_inner_loop_p): Likewise.
> 	* tree-vectorizer.h (stmt_info_for_cost): New struct/typedef.
> 	(stmt_vector_for_cost): New VEC/typedef.
> 	(add_stmt_info_to_vec): New function.
> 	(struct _slp_tree): Remove cost.inside_of_loop field.
> 	(struct _slp_instance): Remove cost.inside_of_loop field; add
> 	stmt_cost_vec field.
> 	(SLP_INSTANCE_INSIDE_OF_LOOP_COST): Remove macro.
> 	(SLP_INSTANCE_STMT_COST_VEC): New accessor macro.
> 	(SLP_TREE_INSIDE_OF_LOOP_COST): Remove macro.
> 	(struct _vect_peel_extended_info): Add stmt_cost_vec field.
> 	(struct _loop_vec_info): Add target_cost_data field.
> 	(LOOP_VINFO_TARGET_COST_DATA): New accessor macro.
> 	(struct _bb_vec_info): Add target_cost_data field.
> 	(BB_VINFO_TARGET_COST_DATA): New accessor macro.
> 	(struct _stmt_vec_info): Remove cost.inside_of_loop field.
> 	(STMT_VINFO_INSIDE_OF_LOOP_COST): Remove macro.
> 	(stmt_vinfo_set_inside_of_loop_cost): Remove function.
> 	(init_cost): New function.
> 	(add_stmt_cost): Likewise.
> 	(finish_cost): Likewise.
> 	(destroy_cost_data): Likewise.
> 	(vect_model_simple_cost): Change parameter list.
> 	(vect_model_store_cost): Likewise.
> 	(vect_model_load_cost): Likewise.
> 	(record_stmt_cost): New extern decl.
> 	(vect_get_load_cost): Change parameter list.
> 	(vect_get_store_cost): Likewise.
> 	* tree-vect-loop.c (new_loop_vec_info): Call init_cost.
> 	(destroy_loop_vec_info): Call destroy_cost_data.
> 	(vect_estimate_min_profitable_iters): Remove old calculation of
> 	inside costs; call finish_cost instead.
> 	(vect_model_reduction_cost): Call add_stmt_cost instead of old
> 	inside-costs calculation.
> 	(vect_model_induction_cost): Likewise.
> 	* tree-vect-data-refs.c (vect_get_data_access_cost): Change to
> 	return a stmt_vector_for_cost; modify calls to vect_get_load_cost
> 	and vect_get_store_cost to obtain the value to return.
> 	(vect_peeling_hash_get_lowest_cost): Obtain a stmt_cost_vec from
> 	vect_get_data_access_cost and store it in the minimum peeling
> 	structure.
> 	(vect_peeling_hash_choose_best_peeling): Change the parameter list
> 	to add a (stmt_vector_for_cost *) output parameter, and set its value.
> 	(vect_enhance_data_refs_alignment): Ignore the new return value from
> 	calls to vect_get_data_access_cost; obtain stmt_cost_vec from
> 	vect_peeling_hash_choose_best_peeling and pass its contents to the
> 	target cost model.
> 	* tree-vect-stmts.c (stmt_vectype): New function.
> 	(stmt_in_inner_loop_p): Likewise.
> 	(record_stmt_cost): Likewise.
> 	(vect_model_simple_cost): Add stmt_cost_vec parameter; call
> 	record_stmt_cost instead of old calculation; don't call
> 	stmt_vinfo_set_inside_of_loop_cost.
> 	(vect_model_promotion_demotion_cost): Call add_stmt_cost instead of
> 	old calculation; don't call stmt_vinfo_set_inside_of_loop_cost.
> 	(vect_model_store_cost): Add stmt_cost_vec parameter; call
> 	record_stmt_cost instead of old calculation; add stmt_cost_vec
> 	parameter to vect_get_store_cost call; don't call
> 	stmt_vinfo_set_inside_of_loop_cost.
> 	(vect_get_store_cost): Add stmt_cost_vec parameter; call
> 	record_stmt_cost instead of old calculation.
> 	(vect_model_load_cost): Add stmt_cost_vec parameter; call
> 	record_stmt_cost instead of old calculation; add stmt_cost_vec
> 	parameter to vect_get_load_cost call; don't call
> 	stmt_vinfo_set_inside_of_loop_cost.
> 	(vect_get_load_cost): Add stmt_cost_vec parameter; call
> 	record_stmt_cost instead of old calculation.
> 	(vectorizable_call): Add NULL parameter to vect_model_simple_cost call.
> 	(vectorizable_conversion): Likewise.
> 	(vectorizable_assignment): Likewise.
> 	(vectorizable_shift): Likewise.
> 	(vectorizable_operation): Likewise.
> 	(vectorizable_store): Add NULL parameter to vect_model_store_cost call.
> 	(vectorizable_load): Add NULL parameter to vect_model_load_cost call.
> 	(new_stmt_vec_info): Don't set STMT_VINFO_INSIDE_OF_LOOP_COST.
> 	* config/spu/spu.c (TARGET_VECTORIZE_INIT_COST): New macro def.
> 	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
> 	(TARGET_VECTORIZE_FINISH_COST): Likewise.
> 	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
> 	(spu_init_cost): New function.
> 	(spu_add_stmt_cost): Likewise.
> 	(spu_finish_cost): Likewise.
> 	(spu_destroy_cost_data): Likewise.
> 	* config/i386/i386.c (ix86_init_cost): New function.
> 	(ix86_add_stmt_cost): Likewise.
> 	(ix86_finish_cost): Likewise.
> 	(ix86_destroy_cost_data): Likewise.
> 	(TARGET_VECTORIZE_INIT_COST): New macro def.
> 	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
> 	(TARGET_VECTORIZE_FINISH_COST): Likewise.
> 	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
> 	* config/rs6000/rs6000.c (TARGET_VECTORIZE_INIT_COST): New macro def.
> 	(TARGET_VECTORIZE_ADD_STMT_COST): Likewise.
> 	(TARGET_VECTORIZE_FINISH_COST): Likewise.
> 	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
> 	(rs6000_init_cost): New function.
> 	(rs6000_add_stmt_cost): Likewise.
> 	(rs6000_finish_cost): Likewise.
> 	(rs6000_destroy_cost_data): Likewise.
> 	* tree-vect-slp.c (vect_free_slp_instance): Free stmt_cost_vec.
> 	(vect_create_new_slp_node): Don't set SLP_TREE_INSIDE_OF_LOOP_COST.
> 	(vect_get_and_check_slp_defs): Add stmt_cost_vec parameter; add
> 	stmt_cost_vec parameter to vect_model_store_cost and
> 	vect_model_simple_cost calls.
> 	(vect_build_slp_tree): Remove inside_cost parameter; add stmt_cost_vec
> 	parameter; add stmt_cost_vec parameter to vect_get_and_check_slp_defs,
> 	vect_model_load_cost, and recursive vect_build_slp_tree calls; prevent
> 	calculating cost more than once for loads; remove inside_cost
> 	parameter from recursive vect_build_slp_tree calls; call
> 	record_stmt_cost instead of old calculation.
> 	(vect_analyze_slp_instance): Allocate stmt_cost_vec and save it with
> 	the instance; free it on premature exit; remove inside_cost parameter
> 	from vect_build_slp_tree call; add stmt_cost_vec parameter to
> 	vect_build_slp_tree call; don't set SLP_INSTANCE_INSIDE_OF_LOOP_COST.
> 	(new_bb_vec_info): Call init_cost.
> 	(destroy_bb_vec_info): Call destroy_cost_data.
> 	(vect_bb_vectorization_profitable_p): Call add_stmt_cost for each
> 	statement recorded with an SLP instance; call finish_cost instead of
> 	the old calculation.
> 	(vect_update_slp_costs_according_to_vf): Record statement costs from
> 	SLP instances, multiplying by the appropriate number of copies; don't
> 	update SLP_INSTANCE_INSIDE_OF_LOOP_COST.
> 
> 
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi	(revision 189292)
> +++ gcc/doc/tm.texi	(working copy)
> @@ -5792,6 +5792,22 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_
>  The default is zero which means to not iterate over other vector sizes.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info})
> +This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates an unsigned integer for accumulating a single cost.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (void *@var{data}, int @var{count}, enum vect_cost_for_stmt @var{kind}, struct _stmt_vec_info *@var{stmt_info}, int @var{misalign})
> +This hook should update the target-specific @var{data} in response to adding @var{count} copies of the given @var{kind} of statement to the body of a loop or basic block.  The default adds the builtin vectorizer cost for the copies of the statement to the accumulator, and returns the amount added.  The return value should be viewed as a tentative cost that may later be overridden.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_FINISH_COST (void *@var{data})
> +This hook should complete calculations of the cost of vectorizing a loop or basic block based on @var{data}, and return that cost as an unsigned integer.  The default returns the value of the accumulator.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} void TARGET_VECTORIZE_DESTROY_COST_DATA (void *@var{data})
> +This hook should release @var{data} and any related data structures allocated by TARGET_VECTORIZE_INIT_COST.  The default releases the accumulator.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_TM_LOAD (tree)
>  This hook should return the built-in decl needed to load a vector of the given type within a transaction.
>  @end deftypefn
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in	(revision 189292)
> +++ gcc/doc/tm.texi.in	(working copy)
> @@ -5724,6 +5724,14 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_
>  The default is zero which means to not iterate over other vector sizes.
>  @end deftypefn
>  
> +@hook TARGET_VECTORIZE_INIT_COST
> +
> +@hook TARGET_VECTORIZE_ADD_STMT_COST
> +
> +@hook TARGET_VECTORIZE_FINISH_COST
> +
> +@hook TARGET_VECTORIZE_DESTROY_COST_DATA
> +
>  @hook TARGET_VECTORIZE_BUILTIN_TM_LOAD
>  
>  @hook TARGET_VECTORIZE_BUILTIN_TM_STORE
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c	(revision 189292)
> +++ gcc/targhooks.c	(working copy)
> @@ -996,6 +996,64 @@ default_autovectorize_vector_sizes (void)
>    return 0;
>  }
>  
> +/* By default, the cost model just accumulates the inside_loop costs for
> +   a vectorized loop or block.  So allocate an unsigned int, set it to
> +   zero, and return its address.  */
> +
> +void *
> +default_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
> +{
> +  unsigned *cost = XNEW (unsigned);
> +  *cost = 0;
> +  return cost;
> +}
> +
> +/* By default, the cost model looks up the cost of the given statement
> +   kind and mode, multiplies it by the occurrence count, accumulates
> +   it into the cost, and returns the cost added.  */
> +
> +unsigned
> +default_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> +		       struct _stmt_vec_info *stmt_info, int misalign)
> +{
> +  unsigned *cost = (unsigned *) data;
> +  unsigned retval = 0;
> +
> +  if (flag_vect_cost_model)
> +    {
> +      tree vectype = stmt_vectype (stmt_info);
> +      int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
> +							  misalign);
> +      /* Statements in an inner loop relative to the loop being
> +	 vectorized are weighted more heavily.  The value here is
> +	 arbitrary and could potentially be improved with analysis.  */
> +      if (stmt_in_inner_loop_p (stmt_info))
> +	count *= 50;  /* FIXME.  */
> +
> +      retval = (unsigned) (count * stmt_cost);
> +      *cost += retval;
> +    }
> +
> +  return retval;
> +}
> +
> +/* By default, the cost model just returns the accumulated
> +   inside_loop cost.  */
> +
> +unsigned
> +default_finish_cost (void *data)
> +{
> +  return *((unsigned *) data);
> +}
> +
> +/* Free the cost data.  */
> +
> +void
> +default_destroy_cost_data (void *data)
> +{
> +  free (data);
> +}
> +
>  /* Determine whether or not a pointer mode is valid. Assume defaults
>     of ptr_mode or Pmode - can be overridden.  */
>  bool
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h	(revision 189292)
> +++ gcc/targhooks.h	(working copy)
> @@ -90,6 +90,11 @@ default_builtin_support_vector_misalignment (enum
>  					     int, bool);
>  extern enum machine_mode default_preferred_simd_mode (enum machine_mode mode);
>  extern unsigned int default_autovectorize_vector_sizes (void);
> +extern void *default_init_cost (struct loop *);
> +extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
> +				       struct _stmt_vec_info *, int);
> +extern unsigned default_finish_cost (void *);
> +extern void default_destroy_cost_data (void *);
>  
>  /* These are here, and not in hooks.[ch], because not all users of
>     hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def	(revision 189292)
> +++ gcc/target.def	(working copy)
> @@ -1063,6 +1063,55 @@ DEFHOOK
>   (const_tree mem_vectype, const_tree index_type, int scale),
>   NULL)
>  
> +/* Target function to initialize the cost model for a loop or block.  */
> +DEFHOOK
> +(init_cost,
> + "This hook should initialize target-specific data structures in preparation "
> + "for modeling the costs of vectorizing a loop or basic block.  The default "
> + "allocates an unsigned integer for accumulating a single cost.  "
> + "If @var{loop_info} is non-NULL, it identifies the loop being vectorized; "
> + "otherwise a single block is being vectorized.",
> + void *,
> + (struct loop *loop_info),
> + default_init_cost)
> +
> +/* Target function to record N statements of the given kind using the
> +   given vector type within the cost model data for the current loop
> +   or block.  */
> +DEFHOOK
> +(add_stmt_cost,
> + "This hook should update the target-specific @var{data} in response to "
> + "adding @var{count} copies of the given @var{kind} of statement to the "
> + "body of a loop or basic block.  The default adds the builtin vectorizer "
> + "cost for the copies of the statement to the accumulator, and returns "
> + "the amount added.  The return value should be viewed as a tentative "
> + "cost that may later be overridden.",
> + unsigned,
> + (void *data, int count, enum vect_cost_for_stmt kind,
> +  struct _stmt_vec_info *stmt_info, int misalign),
> + default_add_stmt_cost)
> +
> +/* Target function to calculate the total cost of the current vectorized
> +   loop or block.  */
> +DEFHOOK
> +(finish_cost,
> + "This hook should complete calculations of the cost of vectorizing a loop "
> + "or basic block based on @var{data}, and return that cost as an unsigned "
> + "integer.  The default returns the value of the accumulator.",
> + unsigned,
> + (void *data),
> + default_finish_cost)
> +
> +/* Function to delete target-specific cost modeling data.  */
> +DEFHOOK
> +(destroy_cost_data,
> + "This hook should release @var{data} and any related data structures "
> + "allocated by TARGET_VECTORIZE_INIT_COST.  The default releases the "
> + "accumulator.",
> + void,
> + (void *data),
> + default_destroy_cost_data)
> +
>  HOOK_VECTOR_END (vectorize)
>  
>  #undef HOOK_PREFIX
> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h	(revision 189292)
> +++ gcc/target.h	(working copy)
> @@ -120,6 +120,13 @@ struct loop;
>  /* This is defined in tree-ssa-alias.h.  */
>  struct ao_ref_s;
>  
> +/* This is defined in tree-vectorizer.h.  */
> +struct _stmt_vec_info;
> +
> +/* These are defined in tree-vect-stmts.c.  */
> +extern tree stmt_vectype (struct _stmt_vec_info *);
> +extern bool stmt_in_inner_loop_p (struct _stmt_vec_info *);
> +
>  /* Assembler instructions for creating various kinds of integer object.  */
>  
>  struct asm_int_op
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h	(revision 189292)
> +++ gcc/tree-vectorizer.h	(working copy)
> @@ -71,6 +71,32 @@ enum vect_def_type {
>                                     || ((D) == vect_double_reduction_def) \
>                                     || ((D) == vect_nested_cycle))
>  
> +/* Structure to encapsulate information about a group of like
> +   instructions to be presented to the target cost model.  */
> +typedef struct _stmt_info_for_cost {
> +  int count;
> +  enum vect_cost_for_stmt kind;
> +  gimple stmt;
> +  int misalign;
> +} stmt_info_for_cost;
> +
> +DEF_VEC_O (stmt_info_for_cost);
> +DEF_VEC_ALLOC_O (stmt_info_for_cost, heap);
> +
> +typedef VEC(stmt_info_for_cost, heap) *stmt_vector_for_cost;
> +
> +static inline void
> +add_stmt_info_to_vec (stmt_vector_for_cost *stmt_cost_vec, int count,
> +		      enum vect_cost_for_stmt kind, gimple stmt, int misalign)
> +{
> +  stmt_info_for_cost si;
> +  si.count = count;
> +  si.kind = kind;
> +  si.stmt = stmt;
> +  si.misalign = misalign;
> +  VEC_safe_push (stmt_info_for_cost, heap, *stmt_cost_vec, &si);
> +}
> +
>  /************************************************************************
>    SLP
>   ************************************************************************/
> @@ -96,7 +122,6 @@ typedef struct _slp_tree {
>    struct
>    {
>      int outside_of_loop;     /* Statements generated outside loop.  */
> -    int inside_of_loop;      /* Statements generated inside loop.  */
>    } cost;
>  } *slp_tree;
>  
> @@ -119,9 +144,11 @@ typedef struct _slp_instance {
>    struct
>    {
>      int outside_of_loop;     /* Statements generated outside loop.  */
> -    int inside_of_loop;      /* Statements generated inside loop.  */
>    } cost;
>  
> +  /* Inside-loop costs.  */
> +  stmt_vector_for_cost stmt_cost_vec;
> +
>    /* Loads permutation relatively to the stores, NULL if there is no
>       permutation.  */
>    VEC (int, heap) *load_permutation;
> @@ -142,7 +169,7 @@ DEF_VEC_ALLOC_P(slp_instance, heap);
>  #define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
>  #define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
>  #define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S)     (S)->cost.outside_of_loop
> -#define SLP_INSTANCE_INSIDE_OF_LOOP_COST(S)      (S)->cost.inside_of_loop
> +#define SLP_INSTANCE_STMT_COST_VEC(S)            (S)->stmt_cost_vec
>  #define SLP_INSTANCE_LOAD_PERMUTATION(S)         (S)->load_permutation
>  #define SLP_INSTANCE_LOADS(S)                    (S)->loads
>  #define SLP_INSTANCE_FIRST_LOAD_STMT(S)          (S)->first_load
> @@ -152,7 +179,6 @@ DEF_VEC_ALLOC_P(slp_instance, heap);
>  #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
>  #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
>  #define SLP_TREE_OUTSIDE_OF_LOOP_COST(S)         (S)->cost.outside_of_loop
> -#define SLP_TREE_INSIDE_OF_LOOP_COST(S)          (S)->cost.inside_of_loop
>  
>  /* This structure is used in creation of an SLP tree.  Each instance
>     corresponds to the same operand in a group of scalar stmts in an SLP
> @@ -186,6 +212,7 @@ typedef struct _vect_peel_extended_info
>    struct _vect_peel_info peel_info;
>    unsigned int inside_cost;
>    unsigned int outside_cost;
> +  stmt_vector_for_cost stmt_cost_vec;
>  } *vect_peel_extended_info;
>  
>  /*-----------------------------------------------------------------*/
> @@ -274,6 +301,9 @@ typedef struct _loop_vec_info {
>    /* Hash table used to choose the best peeling option.  */
>    htab_t peeling_htab;
>  
> +  /* Cost data used by the target cost model.  */
> +  void *target_cost_data;
> +
>    /* When we have grouped data accesses with gaps, we may introduce invalid
>       memory accesses.  We peel the last iteration of the loop to prevent
>       this.  */
> @@ -307,6 +337,7 @@ typedef struct _loop_vec_info {
>  #define LOOP_VINFO_REDUCTIONS(L)           (L)->reductions
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_HTAB(L)         (L)->peeling_htab
> +#define LOOP_VINFO_TARGET_COST_DATA(L)     (L)->target_cost_data
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  
>  #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
> @@ -350,13 +381,18 @@ typedef struct _bb_vec_info {
>  
>    /* All data dependences in the basic block.  */
>    VEC (ddr_p, heap) *ddrs;
> +
> +  /* Cost data used by the target cost model.  */
> +  void *target_cost_data;
> +
>  } *bb_vec_info;
>  
> -#define BB_VINFO_BB(B)              (B)->bb
> -#define BB_VINFO_GROUPED_STORES(B)  (B)->grouped_stores
> -#define BB_VINFO_SLP_INSTANCES(B)   (B)->slp_instances
> -#define BB_VINFO_DATAREFS(B)        (B)->datarefs
> -#define BB_VINFO_DDRS(B)            (B)->ddrs
> +#define BB_VINFO_BB(B)               (B)->bb
> +#define BB_VINFO_GROUPED_STORES(B)   (B)->grouped_stores
> +#define BB_VINFO_SLP_INSTANCES(B)    (B)->slp_instances
> +#define BB_VINFO_DATAREFS(B)         (B)->datarefs
> +#define BB_VINFO_DDRS(B)             (B)->ddrs
> +#define BB_VINFO_TARGET_COST_DATA(B) (B)->target_cost_data
>  
>  static inline bb_vec_info
>  vec_info_for_bb (basic_block bb)
> @@ -534,7 +570,6 @@ typedef struct _stmt_vec_info {
>    struct
>    {
>      int outside_of_loop;     /* Statements generated outside loop.  */
> -    int inside_of_loop;      /* Statements generated inside loop.  */
>    } cost;
>  
>    /* The bb_vec_info with respect to which STMT is vectorized.  */
> @@ -594,7 +629,6 @@ typedef struct _stmt_vec_info {
>  
>  #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_scope)
>  #define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
> -#define STMT_VINFO_INSIDE_OF_LOOP_COST(S)  (S)->cost.inside_of_loop
>  
>  #define HYBRID_SLP_STMT(S)                ((S)->slp_type == hybrid)
>  #define PURE_SLP_STMT(S)                  ((S)->slp_type == pure_slp)
> @@ -733,21 +767,9 @@ is_loop_header_bb_p (basic_block bb)
>    return false;
>  }
>  
> -/* Set inside loop vectorization cost.  */
> +/* Set outside loop vectorization cost.  */
>  
>  static inline void
> -stmt_vinfo_set_inside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node,
> -				    int cost)
> -{
> -  if (slp_node)
> -    SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
> -  else
> -    STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> -}
> -
> -/* Set inside loop vectorization cost.  */
> -
> -static inline void
>  stmt_vinfo_set_outside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node,
>  				     int cost)
>  {
> @@ -782,6 +804,41 @@ int vect_get_stmt_cost (enum vect_cost_for_stmt ty
>                                                         dummy_type, dummy);
>  }
>  
> +/* Alias targetm.vectorize.init_cost.  */
> +
> +static inline void *
> +init_cost (struct loop *loop_info)
> +{
> +  return targetm.vectorize.init_cost (loop_info);
> +}
> +
> +/* Alias targetm.vectorize.add_stmt_cost.  */
> +
> +static inline unsigned
> +add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> +	       stmt_vec_info stmt_info, int misalign)
> +{
> +  return targetm.vectorize.add_stmt_cost (data, count, kind,
> +					  stmt_info, misalign);
> +}
> +
> +/* Alias targetm.vectorize.finish_cost.  */
> +
> +static inline unsigned
> +finish_cost (void *data)
> +{
> +  return targetm.vectorize.finish_cost (data);
> +}
> +
> +/* Alias targetm.vectorize.destroy_cost_data.  */
> +
> +static inline void
> +destroy_cost_data (void *data)
> +{
> +  targetm.vectorize.destroy_cost_data (data);
> +}
> +
> +
>  /*-----------------------------------------------------------------*/
>  /* Info on data references alignment.                              */
>  /*-----------------------------------------------------------------*/
> @@ -849,10 +906,14 @@ extern stmt_vec_info new_stmt_vec_info (gimple stm
>  extern void free_stmt_vec_info (gimple stmt);
>  extern tree vectorizable_function (gimple, tree, tree);
>  extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
> -                                    slp_tree);
> +                                    slp_tree, stmt_vector_for_cost *);
>  extern void vect_model_store_cost (stmt_vec_info, int, bool,
> -				   enum vect_def_type, slp_tree);
> -extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree);
> +				   enum vect_def_type, slp_tree,
> +				   stmt_vector_for_cost *);
> +extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
> +				  stmt_vector_for_cost *);
> +extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
> +				  enum vect_cost_for_stmt, stmt_vec_info, int);
>  extern void vect_finish_stmt_generation (gimple, gimple,
>                                           gimple_stmt_iterator *);
>  extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
> @@ -867,8 +928,10 @@ extern bool vect_analyze_stmt (gimple, bool *, slp
>  extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
>                                      tree, int, slp_tree);
>  extern void vect_get_load_cost (struct data_reference *, int, bool,
> -                                unsigned int *, unsigned int *);
> -extern void vect_get_store_cost (struct data_reference *, int, unsigned int *);
> +				unsigned int *, unsigned int *,
> +				stmt_vector_for_cost *);
> +extern void vect_get_store_cost (struct data_reference *, int,
> +				 unsigned int *, stmt_vector_for_cost *);
>  extern bool vect_supportable_shift (enum tree_code, tree);
>  extern void vect_get_vec_defs (tree, tree, gimple, VEC (tree, heap) **,
>  			       VEC (tree, heap) **, slp_tree, int);
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c	(revision 189292)
> +++ gcc/tree-vect-loop.c	(working copy)
> @@ -852,6 +852,7 @@ new_loop_vec_info (struct loop *loop)
>    LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
>    LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
>    LOOP_VINFO_PEELING_HTAB (res) = NULL;
> +  LOOP_VINFO_TARGET_COST_DATA (res) = init_cost (loop);
>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>  
>    return res;
> @@ -929,6 +930,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b
>    if (LOOP_VINFO_PEELING_HTAB (loop_vinfo))
>      htab_delete (LOOP_VINFO_PEELING_HTAB (loop_vinfo));
>  
> +  destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
> +
>    free (loop_vinfo);
>    loop->aux = NULL;
>  }
> @@ -1362,7 +1365,7 @@ vect_analyze_loop_operations (loop_vec_info loop_v
>                             "not vectorized: relevant phi not supported: ");
>                    print_gimple_stmt (vect_dump, phi, 0, TDF_SLIM);
>                  }
> -              return false;
> +	      return false;
>              }
>          }
>  
> @@ -2498,7 +2501,6 @@ vect_estimate_min_profitable_iters (loop_vec_info
>    int nbbs = loop->num_nodes;
>    int npeel = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
>    int peel_guard_costs = 0;
> -  int innerloop_iters = 0, factor;
>    VEC (slp_instance, heap) *slp_instances;
>    slp_instance instance;
>  
> @@ -2544,20 +2546,11 @@ vect_estimate_min_profitable_iters (loop_vec_info
>       TODO: Consider assigning different costs to different scalar
>       statements.  */
>  
> -  /* FORNOW.  */
> -  if (loop->inner)
> -    innerloop_iters = 50; /* FIXME */
> -
>    for (i = 0; i < nbbs; i++)
>      {
>        gimple_stmt_iterator si;
>        basic_block bb = bbs[i];
>  
> -      if (bb->loop_father == loop->inner)
> - 	factor = innerloop_iters;
> -      else
> - 	factor = 1;
> -
>        for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>  	{
>  	  gimple stmt = gsi_stmt (si);
> @@ -2575,7 +2568,6 @@ vect_estimate_min_profitable_iters (loop_vec_info
>                   || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))))
>  	    continue;
>  
> -	  vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor;
>  	  /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
>  	     some of the "outside" costs are generated inside the outer-loop.  */
>  	  vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
> @@ -2592,14 +2584,9 @@ vect_estimate_min_profitable_iters (loop_vec_info
>  		    = vinfo_for_stmt (pattern_def_stmt);
>                    if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)
>                        || STMT_VINFO_LIVE_P (pattern_def_stmt_info))
> -		    {
> -                      vec_inside_cost
> -			+= STMT_VINFO_INSIDE_OF_LOOP_COST
> -			   (pattern_def_stmt_info) * factor;
> -                      vec_outside_cost
> -			+= STMT_VINFO_OUTSIDE_OF_LOOP_COST
> -			   (pattern_def_stmt_info);
> -                    }
> +		    vec_outside_cost
> +		      += STMT_VINFO_OUTSIDE_OF_LOOP_COST
> +		        (pattern_def_stmt_info);
>  		}
>  	    }
>  	}
> @@ -2725,11 +2712,12 @@ vect_estimate_min_profitable_iters (loop_vec_info
>    /* Add SLP costs.  */
>    slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
>    FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
> -    {
> -      vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
> -      vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
> -    }
> +    vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
>  
> +  /* Complete the target-specific cost calculation for the inside-of-loop
> +     costs.  */
> +  vec_inside_cost = finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
> +  
>    /* Calculate number of iterations required to make the vector version
>       profitable, relative to the loop bodies only.  The following condition
>       must hold true:
> @@ -2826,10 +2814,10 @@ vect_model_reduction_cost (stmt_vec_info stmt_info
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>  
> -
>    /* Cost of reduction op inside loop.  */
> -  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) 
> -    += ncopies * vect_get_stmt_cost (vector_stmt);
> +  unsigned inside_cost
> +    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> +		     ncopies, vector_stmt, stmt_info, 0);
>  
>    stmt = STMT_VINFO_STMT (stmt_info);
>  
> @@ -2915,7 +2903,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_reduction_cost: inside_cost = %d, "
> -             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
> +             "outside_cost = %d .", inside_cost,
>               STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
>  
>    return true;
> @@ -2929,16 +2917,20 @@ vect_model_reduction_cost (stmt_vec_info stmt_info
>  static void
>  vect_model_induction_cost (stmt_vec_info stmt_info, int ncopies)
>  {
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +
>    /* loop cost for vec_loop.  */
> -  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) 
> -    = ncopies * vect_get_stmt_cost (vector_stmt);
> +  unsigned inside_cost
> +    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), ncopies,
> +		     vector_stmt, stmt_info, 0);
> +
>    /* prologue cost for vec_init and vec_step.  */
>    STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)  
>      = 2 * vect_get_stmt_cost (scalar_to_vec);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_induction_cost: inside_cost = %d, "
> -             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
> +             "outside_cost = %d .", inside_cost,
>               STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
>  }
>  
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c	(revision 189292)
> +++ gcc/tree-vect-data-refs.c	(working copy)
> @@ -1205,7 +1205,7 @@ vector_alignment_reachable_p (struct data_referenc
>  
>  /* Calculate the cost of the memory access represented by DR.  */
>  
> -static void
> +static stmt_vector_for_cost
>  vect_get_data_access_cost (struct data_reference *dr,
>                             unsigned int *inside_cost,
>                             unsigned int *outside_cost)
> @@ -1216,15 +1216,19 @@ vect_get_data_access_cost (struct data_reference *
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>    int ncopies = vf / nunits;
> +  stmt_vector_for_cost stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
>  
>    if (DR_IS_READ (dr))
> -    vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost);
> +    vect_get_load_cost (dr, ncopies, true, inside_cost,
> +			outside_cost, &stmt_cost_vec);
>    else
> -    vect_get_store_cost (dr, ncopies, inside_cost);
> +    vect_get_store_cost (dr, ncopies, inside_cost, &stmt_cost_vec);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
>               "outside_cost = %d.", *inside_cost, *outside_cost);
> +
> +  return stmt_cost_vec;
>  }
>  
>  
> @@ -1317,6 +1321,7 @@ vect_peeling_hash_get_lowest_cost (void **slot, vo
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    VEC (data_reference_p, heap) *datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
>    struct data_reference *dr;
> +  stmt_vector_for_cost stmt_cost_vec = NULL;
>  
>    FOR_EACH_VEC_ELT (data_reference_p, datarefs, i, dr)
>      {
> @@ -1330,7 +1335,8 @@ vect_peeling_hash_get_lowest_cost (void **slot, vo
>  
>        save_misalignment = DR_MISALIGNMENT (dr);
>        vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
> -      vect_get_data_access_cost (dr, &inside_cost, &outside_cost);
> +      stmt_cost_vec = vect_get_data_access_cost (dr, &inside_cost,
> +						 &outside_cost);
>        SET_DR_MISALIGNMENT (dr, save_misalignment);
>      }
>  
> @@ -1342,6 +1348,7 @@ vect_peeling_hash_get_lowest_cost (void **slot, vo
>      {
>        min->inside_cost = inside_cost;
>        min->outside_cost = outside_cost;
> +      min->stmt_cost_vec = stmt_cost_vec;
>        min->peel_info.dr = elem->dr;
>        min->peel_info.npeel = elem->npeel;
>      }
> @@ -1356,11 +1363,13 @@ vect_peeling_hash_get_lowest_cost (void **slot, vo
>  
>  static struct data_reference *
>  vect_peeling_hash_choose_best_peeling (loop_vec_info loop_vinfo,
> -                                       unsigned int *npeel)
> +                                       unsigned int *npeel,
> +				       stmt_vector_for_cost *stmt_cost_vec)
>  {
>     struct _vect_peel_extended_info res;
>  
>     res.peel_info.dr = NULL;
> +   res.stmt_cost_vec = NULL;
>  
>     if (flag_vect_cost_model)
>       {
> @@ -1377,6 +1386,7 @@ vect_peeling_hash_choose_best_peeling (loop_vec_in
>       }
>  
>     *npeel = res.peel_info.npeel;
> +   *stmt_cost_vec = res.stmt_cost_vec;
>     return res.peel_info.dr;
>  }
>  
> @@ -1493,6 +1503,7 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>    unsigned possible_npeel_number = 1;
>    tree vectype;
>    unsigned int nelements, mis, same_align_drs_max = 0;
> +  stmt_vector_for_cost stmt_cost_vec = NULL;
>  
>    if (vect_print_dump_info (REPORT_DETAILS))
>      fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
> @@ -1697,10 +1708,10 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>            unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
>            unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
>  
> -          vect_get_data_access_cost (dr0, &load_inside_cost,
> -                                     &load_outside_cost);
> -          vect_get_data_access_cost (first_store, &store_inside_cost,
> -                                     &store_outside_cost);
> +          (void) vect_get_data_access_cost (dr0, &load_inside_cost,
> +					    &load_outside_cost);
> +          (void) vect_get_data_access_cost (first_store, &store_inside_cost,
> +					    &store_outside_cost);
>  
>            /* Calculate the penalty for leaving FIRST_STORE unaligned (by
>               aligning the load DR0).  */
> @@ -1764,7 +1775,8 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>        gcc_assert (!all_misalignments_unknown);
>  
>        /* Choose the best peeling from the hash table.  */
> -      dr0 = vect_peeling_hash_choose_best_peeling (loop_vinfo, &npeel);
> +      dr0 = vect_peeling_hash_choose_best_peeling (loop_vinfo, &npeel,
> +						   &stmt_cost_vec);
>        if (!dr0 || !npeel)
>          do_peeling = false;
>      }
> @@ -1848,6 +1860,8 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>  
>        if (do_peeling)
>          {
> +	  stmt_info_for_cost *si;
> +
>            /* (1.2) Update the DR_MISALIGNMENT of each data reference DR_i.
>               If the misalignment of DR_i is identical to that of dr0 then set
>               DR_MISALIGNMENT (DR_i) to zero.  If the misalignment of DR_i and
> @@ -1871,6 +1885,18 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>            if (vect_print_dump_info (REPORT_DETAILS))
>              fprintf (vect_dump, "Peeling for alignment will be applied.");
>  
> +	  /* We've delayed passing the inside-loop peeling costs to the
> +	     target cost model until we were sure peeling would happen.
> +	     Do so now.  */
> +	  if (stmt_cost_vec)
> +	    {
> +	      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, i, si)
> +		(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> +				      si->count, si->kind,
> +				      vinfo_for_stmt (si->stmt), si->misalign);
> +	      VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
> +	    }
> +
>  	  stat = vect_verify_datarefs_alignment (loop_vinfo, NULL);
>  	  gcc_assert (stat);
>            return stat;
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c	(revision 189292)
> +++ gcc/tree-vect-stmts.c	(working copy)
> @@ -41,6 +41,66 @@ along with GCC; see the file COPYING3.  If not see
>  #include "langhooks.h"
>  
>  
> +/* Return the vectorized type for the given statement.  */
> +
> +tree
> +stmt_vectype (struct _stmt_vec_info *stmt_info)
> +{
> +  return STMT_VINFO_VECTYPE (stmt_info);
> +}
> +
> +/* Return TRUE iff the given statement is in an inner loop relative to
> +   the loop being vectorized.  */
> +bool
> +stmt_in_inner_loop_p (struct _stmt_vec_info *stmt_info)
> +{
> +  gimple stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block bb = gimple_bb (stmt);
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  struct loop* loop;
> +
> +  if (!loop_vinfo)
> +    return false;
> +
> +  loop = LOOP_VINFO_LOOP (loop_vinfo);
> +
> +  return (bb->loop_father == loop->inner);
> +}
> +
> +/* Record the cost of a statement, either by directly informing the 
> +   target model or by saving it in a vector for later processing.
> +   Return a preliminary estimate of the statement's cost.  */
> +
> +unsigned
> +record_stmt_cost (stmt_vector_for_cost *stmt_cost_vec, int count,
> +		  enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
> +		  int misalign)
> +{
> +  if (stmt_cost_vec)
> +    {
> +      tree vectype = stmt_vectype (stmt_info);
> +      add_stmt_info_to_vec (stmt_cost_vec, count, kind,
> +			    STMT_VINFO_STMT (stmt_info), misalign);
> +      return (unsigned)
> +	(targetm.vectorize.builtin_vectorization_cost (kind, vectype, misalign)
> +	 * count);
> +	 
> +    }
> +  else
> +    {
> +      loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +      bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> +      void *target_cost_data;
> +
> +      if (loop_vinfo)
> +	target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
> +      else
> +	target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
> +
> +      return add_stmt_cost (target_cost_data, count, kind, stmt_info, misalign);
> +    }
> +}
> +
>  /* Return a variable of type ELEM_TYPE[NELEMS].  */
>  
>  static tree
> @@ -735,7 +795,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info lo
>  
>  void
>  vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies,
> -			enum vect_def_type *dt, slp_tree slp_node)
> +			enum vect_def_type *dt, slp_tree slp_node,
> +			stmt_vector_for_cost *stmt_cost_vec)
>  {
>    int i;
>    int inside_cost = 0, outside_cost = 0;
> @@ -744,8 +805,6 @@ vect_model_simple_cost (stmt_vec_info stmt_info, i
>    if (PURE_SLP_STMT (stmt_info))
>      return;
>  
> -  inside_cost = ncopies * vect_get_stmt_cost (vector_stmt); 
> -
>    /* FORNOW: Assuming maximum 2 args per stmts.  */
>    for (i = 0; i < 2; i++)
>      {
> @@ -753,13 +812,16 @@ vect_model_simple_cost (stmt_vec_info stmt_info, i
>  	outside_cost += vect_get_stmt_cost (vector_stmt); 
>      }
>  
> +  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
> +  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
> +
> +  /* Pass the inside-of-loop statements to the target-specific cost model.  */
> +  inside_cost = record_stmt_cost (stmt_cost_vec, ncopies, vector_stmt,
> +				  stmt_info, 0);
> +
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
>               "outside_cost = %d .", inside_cost, outside_cost);
> -
> -  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
> -  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
> -  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
>  }
>  
>  
> @@ -773,18 +835,26 @@ vect_model_promotion_demotion_cost (stmt_vec_info
>  				    enum vect_def_type *dt, int pwr)
>  {
>    int i, tmp;
> -  int inside_cost = 0, outside_cost = 0, single_stmt_cost;
> +  int inside_cost = 0, outside_cost = 0;
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> +  void *target_cost_data;
>  
>    /* The SLP costs were already calculated during SLP tree build.  */
>    if (PURE_SLP_STMT (stmt_info))
>      return;
>  
> -  single_stmt_cost = vect_get_stmt_cost (vec_promote_demote);
> +  if (loop_vinfo)
> +    target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
> +  else
> +    target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
> +
>    for (i = 0; i < pwr + 1; i++)
>      {
>        tmp = (STMT_VINFO_TYPE (stmt_info) == type_promotion_vec_info_type) ?
>  	(i + 1) : i;
> -      inside_cost += vect_pow2 (tmp) * single_stmt_cost;
> +      inside_cost += add_stmt_cost (target_cost_data, vect_pow2 (tmp),
> +				    vec_promote_demote, stmt_info, 0);
>      }
>  
>    /* FORNOW: Assuming maximum 2 args per stmts.  */
> @@ -799,7 +869,6 @@ vect_model_promotion_demotion_cost (stmt_vec_info
>               "outside_cost = %d .", inside_cost, outside_cost);
>  
>    /* Set the costs in STMT_INFO.  */
> -  stmt_vinfo_set_inside_of_loop_cost (stmt_info, NULL, inside_cost);
>    stmt_vinfo_set_outside_of_loop_cost (stmt_info, NULL, outside_cost);
>  }
>  
> @@ -829,7 +898,7 @@ vect_cost_group_size (stmt_vec_info stmt_info)
>  void
>  vect_model_store_cost (stmt_vec_info stmt_info, int ncopies,
>  		       bool store_lanes_p, enum vect_def_type dt,
> -		       slp_tree slp_node)
> +		       slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
>  {
>    int group_size;
>    unsigned int inside_cost = 0, outside_cost = 0;
> @@ -873,8 +942,10 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>    if (!store_lanes_p && group_size > 1)
>      {
>        /* Uses a high and low interleave operation for each needed permute.  */
> -      inside_cost = ncopies * exact_log2(group_size) * group_size
> -        * vect_get_stmt_cost (vec_perm);
> +      
> +      int nstmts = ncopies * exact_log2 (group_size) * group_size;
> +      inside_cost = record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
> +				      stmt_info, 0);
>  
>        if (vect_print_dump_info (REPORT_COST))
>          fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
> @@ -882,14 +953,13 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>      }
>  
>    /* Costs of the stores.  */
> -  vect_get_store_cost (first_dr, ncopies, &inside_cost);
> +  vect_get_store_cost (first_dr, ncopies, &inside_cost, stmt_cost_vec);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
>               "outside_cost = %d .", inside_cost, outside_cost);
>  
>    /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
> -  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
>    stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
>  }
>  
> @@ -897,15 +967,19 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>  /* Calculate cost of DR's memory access.  */
>  void
>  vect_get_store_cost (struct data_reference *dr, int ncopies,
> -                     unsigned int *inside_cost)
> +		     unsigned int *inside_cost,
> +		     stmt_vector_for_cost *stmt_cost_vec)
>  {
>    int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
> +  gimple stmt = DR_STMT (dr);
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>  
>    switch (alignment_support_scheme)
>      {
>      case dr_aligned:
>        {
> -        *inside_cost += ncopies * vect_get_stmt_cost (vector_store);
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +					  vector_store, stmt_info, 0);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_store_cost: aligned.");
> @@ -915,14 +989,10 @@ vect_get_store_cost (struct data_reference *dr, in
>  
>      case dr_unaligned_supported:
>        {
> -        gimple stmt = DR_STMT (dr);
> -        stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> -        tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> -
>          /* Here, we assign an additional cost for the unaligned store.  */
> -        *inside_cost += ncopies
> -          * targetm.vectorize.builtin_vectorization_cost (unaligned_store,
> -                                 vectype, DR_MISALIGNMENT (dr));
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +					  unaligned_store, stmt_info,
> +					  DR_MISALIGNMENT (dr));
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_store_cost: unaligned supported by "
> @@ -956,7 +1026,7 @@ vect_get_store_cost (struct data_reference *dr, in
>  
>  void
>  vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, bool load_lanes_p,
> -		      slp_tree slp_node)
> +		      slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
>  {
>    int group_size;
>    gimple first_stmt;
> @@ -988,8 +1058,9 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>    if (!load_lanes_p && group_size > 1)
>      {
>        /* Uses an even and odd extract operations for each needed permute.  */
> -      inside_cost = ncopies * exact_log2(group_size) * group_size
> -	* vect_get_stmt_cost (vec_perm);
> +      int nstmts = ncopies * exact_log2 (group_size) * group_size;
> +      inside_cost += record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
> +				       stmt_info, 0);
>  
>        if (vect_print_dump_info (REPORT_COST))
>          fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
> @@ -1001,24 +1072,23 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>      {
>        /* N scalar loads plus gathering them into a vector.  */
>        tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> -      inside_cost += (vect_get_stmt_cost (scalar_load) * ncopies
> -		      * TYPE_VECTOR_SUBPARTS (vectype));
> -      inside_cost += ncopies
> -	* targetm.vectorize.builtin_vectorization_cost (vec_construct,
> -							vectype, 0);
> +      inside_cost += record_stmt_cost (stmt_cost_vec,
> +				       ncopies * TYPE_VECTOR_SUBPARTS (vectype),
> +				       scalar_load, stmt_info, 0);
> +      inside_cost += record_stmt_cost (stmt_cost_vec, ncopies, vec_construct,
> +				       stmt_info, 0);
>      }
>    else
>      vect_get_load_cost (first_dr, ncopies,
>  			((!STMT_VINFO_GROUPED_ACCESS (stmt_info))
>  			 || group_size > 1 || slp_node),
> -			&inside_cost, &outside_cost);
> +			&inside_cost, &outside_cost, stmt_cost_vec);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
>               "outside_cost = %d .", inside_cost, outside_cost);
>  
>    /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
> -  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
>    stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
>  }
>  
> @@ -1026,16 +1096,20 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>  /* Calculate cost of DR's memory access.  */
>  void
>  vect_get_load_cost (struct data_reference *dr, int ncopies,
> -                    bool add_realign_cost, unsigned int *inside_cost,
> -                    unsigned int *outside_cost)
> +		    bool add_realign_cost, unsigned int *inside_cost,
> +		    unsigned int *outside_cost,
> +		    stmt_vector_for_cost *stmt_cost_vec)
>  {
>    int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
> +  gimple stmt = DR_STMT (dr);
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>  
>    switch (alignment_support_scheme)
>      {
>      case dr_aligned:
>        {
> -        *inside_cost += ncopies * vect_get_stmt_cost (vector_load); 
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +					  vector_load, stmt_info, 0);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_load_cost: aligned.");
> @@ -1044,14 +1118,11 @@ vect_get_load_cost (struct data_reference *dr, int
>        }
>      case dr_unaligned_supported:
>        {
> -        gimple stmt = DR_STMT (dr);
> -        stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> -        tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +        /* Here, we assign an additional cost for the unaligned load.  */
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +					  unaligned_load, stmt_info,
> +					  DR_MISALIGNMENT (dr));
>  
> -        /* Here, we assign an additional cost for the unaligned load.  */
> -        *inside_cost += ncopies
> -          * targetm.vectorize.builtin_vectorization_cost (unaligned_load,
> -                                           vectype, DR_MISALIGNMENT (dr));
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_load_cost: unaligned supported by "
>                     "hardware.");
> @@ -1060,14 +1131,17 @@ vect_get_load_cost (struct data_reference *dr, int
>        }
>      case dr_explicit_realign:
>        {
> -        *inside_cost += ncopies * (2 * vect_get_stmt_cost (vector_load)
> -				   + vect_get_stmt_cost (vec_perm));
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies * 2,
> +					  vector_load, stmt_info, 0);
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +					  vec_perm, stmt_info, 0);
>  
>          /* FIXME: If the misalignment remains fixed across the iterations of
>             the containing loop, the following cost should be added to the
>             outside costs.  */
>          if (targetm.vectorize.builtin_mask_for_load)
> -          *inside_cost += vect_get_stmt_cost (vector_stmt);
> +	  *inside_cost += record_stmt_cost (stmt_cost_vec, 1, vector_stmt,
> +					    stmt_info, 0);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_load_cost: explicit realign");
> @@ -1094,8 +1168,10 @@ vect_get_load_cost (struct data_reference *dr, int
>                *outside_cost += vect_get_stmt_cost (vector_stmt);
>            }
>  
> -        *inside_cost += ncopies * (vect_get_stmt_cost (vector_load)
> -				   + vect_get_stmt_cost (vec_perm));
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +					  vector_load, stmt_info, 0);
> +	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +					  vec_perm, stmt_info, 0);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump,
> @@ -1719,7 +1795,7 @@ vectorizable_call (gimple stmt, gimple_stmt_iterat
>        STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
>        if (vect_print_dump_info (REPORT_DETAILS))
>          fprintf (vect_dump, "=== vectorizable_call ===");
> -      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
> +      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
>        return true;
>      }
>  
> @@ -2433,7 +2509,7 @@ vectorizable_conversion (gimple stmt, gimple_stmt_
>        if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR)
>          {
>  	  STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
> -	  vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
> +	  vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
>  	}
>        else if (modifier == NARROW)
>  	{
> @@ -2841,7 +2917,7 @@ vectorizable_assignment (gimple stmt, gimple_stmt_
>        STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
>        if (vect_print_dump_info (REPORT_DETAILS))
>          fprintf (vect_dump, "=== vectorizable_assignment ===");
> -      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
> +      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
>        return true;
>      }
>  
> @@ -3187,7 +3263,7 @@ vectorizable_shift (gimple stmt, gimple_stmt_itera
>        STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
>        if (vect_print_dump_info (REPORT_DETAILS))
>          fprintf (vect_dump, "=== vectorizable_shift ===");
> -      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
> +      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
>        return true;
>      }
>  
> @@ -3565,7 +3641,7 @@ vectorizable_operation (gimple stmt, gimple_stmt_i
>        STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
>        if (vect_print_dump_info (REPORT_DETAILS))
>          fprintf (vect_dump, "=== vectorizable_operation ===");
> -      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
> +      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
>        return true;
>      }
>  
> @@ -3938,7 +4014,7 @@ vectorizable_store (gimple stmt, gimple_stmt_itera
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
> -      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL);
> +      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL, NULL);
>        return true;
>      }
>  
> @@ -4494,7 +4570,7 @@ vectorizable_load (gimple stmt, gimple_stmt_iterat
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
> -      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL);
> +      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL, NULL);
>        return true;
>      }
>  
> @@ -5934,7 +6010,6 @@ new_stmt_vec_info (gimple stmt, loop_vec_info loop
>      STMT_VINFO_DEF_TYPE (res) = vect_internal_def;
>  
>    STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
> -  STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
>    STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
>    STMT_SLP_TYPE (res) = loop_vect;
>    GROUP_FIRST_ELEMENT (res) = NULL;
> Index: gcc/config/spu/spu.c
> ===================================================================
> --- gcc/config/spu/spu.c	(revision 189292)
> +++ gcc/config/spu/spu.c	(working copy)
> @@ -443,6 +443,18 @@ static void spu_setup_incoming_varargs (cumulative
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST spu_builtin_vectorization_cost
>  
> +#undef TARGET_VECTORIZE_INIT_COST
> +#define TARGET_VECTORIZE_INIT_COST spu_init_cost
> +
> +#undef TARGET_VECTORIZE_ADD_STMT_COST
> +#define TARGET_VECTORIZE_ADD_STMT_COST spu_add_stmt_cost
> +
> +#undef TARGET_VECTORIZE_FINISH_COST
> +#define TARGET_VECTORIZE_FINISH_COST spu_finish_cost
> +
> +#undef TARGET_VECTORIZE_DESTROY_COST_DATA
> +#define TARGET_VECTORIZE_DESTROY_COST_DATA spu_destroy_cost_data
> +
>  #undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
>  #define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE spu_vector_alignment_reachable
>  
> @@ -6947,6 +6959,59 @@ spu_builtin_vectorization_cost (enum vect_cost_for
>      }
>  }
>  
> +/* Implement targetm.vectorize.init_cost.  */
> +
> +void *
> +spu_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
> +{
> +  unsigned *cost = XNEW (unsigned);
> +  *cost = 0;
> +  return cost;
> +}
> +
> +/* Implement targetm.vectorize.add_stmt_cost.  */
> +
> +unsigned
> +spu_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> +		   struct _stmt_vec_info *stmt_info, int misalign)
> +{
> +  unsigned *cost = (unsigned *) data;
> +  unsigned retval = 0;
> +
> +  if (flag_vect_cost_model)
> +    {
> +      tree vectype = stmt_vectype (stmt_info);
> +      int stmt_cost = spu_builtin_vectorization_cost (kind, vectype, misalign);
> +
> +      /* Statements in an inner loop relative to the loop being
> +	 vectorized are weighted more heavily.  The value here is
> +	 arbitrary and could potentially be improved with analysis.  */
> +      if (stmt_in_inner_loop_p (stmt_info))
> +	count *= 50;  /* FIXME.  */
> +
> +      retval = (unsigned) (count * stmt_cost);
> +      *cost += retval;
> +    }
> +
> +  return retval;
> +}
> +
> +/* Implement targetm.vectorize.finish_cost.  */
> +
> +unsigned
> +spu_finish_cost (void *data)
> +{
> +  return *((unsigned *) data);
> +}
> +
> +/* Implement targetm.vectorize.destroy_cost_data.  */
> +
> +void
> +spu_destroy_cost_data (void *data)
> +{
> +  free (data);
> +}
> +
>  /* Return true iff, data reference of TYPE can reach vector alignment (16)
>     after applying N number of iterations.  This routine does not determine
>     how may iterations are required to reach desired alignment.  */
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c	(revision 189292)
> +++ gcc/config/i386/i386.c	(working copy)
> @@ -40132,6 +40132,59 @@ ix86_autovectorize_vector_sizes (void)
>    return (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
>  }
>  
> +/* Implement targetm.vectorize.init_cost.  */
> +
> +void *
> +ix86_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
> +{
> +  unsigned *cost = XNEW (unsigned);
> +  *cost = 0;
> +  return cost;
> +}
> +
> +/* Implement targetm.vectorize.add_stmt_cost.  */
> +
> +unsigned
> +ix86_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> +		    struct _stmt_vec_info *stmt_info, int misalign)
> +{
> +  unsigned *cost = (unsigned *) data;
> +  unsigned retval = 0;
> +
> +  if (flag_vect_cost_model)
> +    {
> +      tree vectype = stmt_vectype (stmt_info);
> +      int stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> +
> +      /* Statements in an inner loop relative to the loop being
> +	 vectorized are weighted more heavily.  The value here is
> +	 arbitrary and could potentially be improved with analysis.  */
> +      if (stmt_in_inner_loop_p (stmt_info))
> +	count *= 50;  /* FIXME.  */
> +
> +      retval = (unsigned) (count * stmt_cost);
> +      *cost += retval;
> +    }
> +
> +  return retval;
> +}
> +
> +/* Implement targetm.vectorize.finish_cost.  */
> +
> +unsigned
> +ix86_finish_cost (void *data)
> +{
> +  return *((unsigned *) data);
> +}
> +
> +/* Implement targetm.vectorize.destroy_cost_data.  */
> +
> +void
> +ix86_destroy_cost_data (void *data)
> +{
> +  free (data);
> +}
> +
>  /* Validate target specific memory model bits in VAL. */
>  
>  static unsigned HOST_WIDE_INT
> @@ -40442,6 +40495,14 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
>  #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
>  #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
>    ix86_autovectorize_vector_sizes
> +#undef TARGET_VECTORIZE_INIT_COST
> +#define TARGET_VECTORIZE_INIT_COST ix86_init_cost
> +#undef TARGET_VECTORIZE_ADD_STMT_COST
> +#define TARGET_VECTORIZE_ADD_STMT_COST ix86_add_stmt_cost
> +#undef TARGET_VECTORIZE_FINISH_COST
> +#define TARGET_VECTORIZE_FINISH_COST ix86_finish_cost
> +#undef TARGET_VECTORIZE_DESTROY_COST_DATA
> +#define TARGET_VECTORIZE_DESTROY_COST_DATA ix86_destroy_cost_data
>  
>  #undef TARGET_SET_CURRENT_FUNCTION
>  #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 189292)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -1288,6 +1288,14 @@ static const struct attribute_spec rs6000_attribut
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
>    rs6000_preferred_simd_mode
> +#undef TARGET_VECTORIZE_INIT_COST
> +#define TARGET_VECTORIZE_INIT_COST rs6000_init_cost
> +#undef TARGET_VECTORIZE_ADD_STMT_COST
> +#define TARGET_VECTORIZE_ADD_STMT_COST rs6000_add_stmt_cost
> +#undef TARGET_VECTORIZE_FINISH_COST
> +#define TARGET_VECTORIZE_FINISH_COST rs6000_finish_cost
> +#undef TARGET_VECTORIZE_DESTROY_COST_DATA
> +#define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
>  
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
> @@ -3563,6 +3571,59 @@ rs6000_preferred_simd_mode (enum machine_mode mode
>    return word_mode;
>  }
>  
> +/* Implement targetm.vectorize.init_cost.  */
> +
> +void *
> +rs6000_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
> +{
> +  unsigned *cost = XNEW (unsigned);
> +  *cost = 0;
> +  return cost;
> +}
> +
> +/* Implement targetm.vectorize.add_stmt_cost.  */
> +
> +unsigned
> +rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> +		      struct _stmt_vec_info *stmt_info, int misalign)
> +{
> +  unsigned *cost = (unsigned *) data;
> +  unsigned retval = 0;
> +
> +  if (flag_vect_cost_model)
> +    {
> +      tree vectype = stmt_vectype (stmt_info);
> +      int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype,
> +							 misalign);
> +      /* Statements in an inner loop relative to the loop being
> +	 vectorized are weighted more heavily.  The value here is
> +	 arbitrary and could potentially be improved with analysis.  */
> +      if (stmt_in_inner_loop_p (stmt_info))
> +	count *= 50;  /* FIXME.  */
> +
> +      retval = (unsigned) (count * stmt_cost);
> +      *cost += retval;
> +    }
> +
> +  return retval;
> +}
> +
> +/* Implement targetm.vectorize.finish_cost.  */
> +
> +unsigned
> +rs6000_finish_cost (void *data)
> +{
> +  return *((unsigned *) data);
> +}
> +
> +/* Implement targetm.vectorize.destroy_cost_data.  */
> +
> +void
> +rs6000_destroy_cost_data (void *data)
> +{
> +  free (data);
> +}
> +
>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
>     library with vectorized intrinsics.  */
>  
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c	(revision 189292)
> +++ gcc/tree-vect-slp.c	(working copy)
> @@ -94,6 +94,7 @@ vect_free_slp_instance (slp_instance instance)
>    vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
>    VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (instance));
>    VEC_free (slp_tree, heap, SLP_INSTANCE_LOADS (instance));
> +  VEC_free (stmt_info_for_cost, heap, SLP_INSTANCE_STMT_COST_VEC (instance));
>  }
>  
>  
> @@ -122,7 +123,6 @@ vect_create_new_slp_node (VEC (gimple, heap) *scal
>    SLP_TREE_VEC_STMTS (node) = NULL;
>    SLP_TREE_CHILDREN (node) = VEC_alloc (slp_void_p, heap, nops);
>    SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0;
> -  SLP_TREE_INSIDE_OF_LOOP_COST (node) = 0;
>  
>    return node;
>  }
> @@ -179,7 +179,8 @@ static bool
>  vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
>                               slp_tree slp_node, gimple stmt,
>  			     int ncopies_for_cost, bool first,
> -                             VEC (slp_oprnd_info, heap) **oprnds_info)
> +                             VEC (slp_oprnd_info, heap) **oprnds_info,
> +			     stmt_vector_for_cost *stmt_cost_vec)
>  {
>    tree oprnd;
>    unsigned int i, number_of_oprnds;
> @@ -320,7 +321,7 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vi
>  	      if (REFERENCE_CLASS_P (lhs))
>  		/* Store.  */
>                  vect_model_store_cost (stmt_info, ncopies_for_cost, false,
> -                                        dt, slp_node);
> +				       dt, slp_node, stmt_cost_vec);
>  	      else
>  		{
>  		  enum vect_def_type dts[2];
> @@ -329,7 +330,7 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vi
>  		  /* Not memory operation (we don't call this function for
>  		     loads).  */
>  		  vect_model_simple_cost (stmt_info, ncopies_for_cost, dts,
> -					  slp_node);
> +					  slp_node, stmt_cost_vec);
>  		}
>  	    }
>  	}
> @@ -446,12 +447,12 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vi
>  
>  static bool
>  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
> -                     slp_tree *node, unsigned int group_size,
> -                     int *inside_cost, int *outside_cost,
> +                     slp_tree *node, unsigned int group_size, int *outside_cost,
>                       int ncopies_for_cost, unsigned int *max_nunits,
>                       VEC (int, heap) **load_permutation,
>                       VEC (slp_tree, heap) **loads,
> -                     unsigned int vectorization_factor, bool *loads_permuted)
> +                     unsigned int vectorization_factor, bool *loads_permuted,
> +		     stmt_vector_for_cost *stmt_cost_vec)
>  {
>    unsigned int i;
>    VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (*node);
> @@ -470,7 +471,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>    HOST_WIDE_INT dummy;
>    bool permutation = false;
>    unsigned int load_place;
> -  gimple first_load, prev_first_load = NULL;
> +  gimple first_load = NULL, prev_first_load = NULL, old_first_load = NULL;
>    VEC (slp_oprnd_info, heap) *oprnds_info;
>    unsigned int nops;
>    slp_oprnd_info oprnd_info;
> @@ -711,7 +712,8 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>  	      /* Store.  */
>  	      if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node,
>  						stmt, ncopies_for_cost,
> -						(i == 0), &oprnds_info))
> +						(i == 0), &oprnds_info,
> +						stmt_cost_vec))
>  		{
>  	  	  vect_free_oprnd_info (&oprnds_info);
>   		  return false;
> @@ -754,6 +756,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>                    return false;
>                  }
>  
> +	      old_first_load = first_load;
>                first_load = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt));
>                if (prev_first_load)
>                  {
> @@ -778,7 +781,9 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>                else
>                  prev_first_load = first_load;
>  
> -              if (first_load == stmt)
> +	      /* In some cases a group of loads is just the same load
> +		 repeated N times.  Only analyze its cost once.  */
> +              if (first_load == stmt && old_first_load != first_load)
>                  {
>                    first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
>                    if (vect_supportable_dr_alignment (first_dr, false)
> @@ -797,7 +802,8 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>  
>                    /* Analyze costs (for the first stmt in the group).  */
>                    vect_model_load_cost (vinfo_for_stmt (stmt),
> -                                        ncopies_for_cost, false, *node);
> +                                        ncopies_for_cost, false, *node,
> +					stmt_cost_vec);
>                  }
>  
>                /* Store the place of this load in the interleaving chain.  In
> @@ -871,7 +877,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>  	  /* Find the def-stmts.  */
>  	  if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node, stmt,
>  					    ncopies_for_cost, (i == 0),
> -					    &oprnds_info))
> +					    &oprnds_info, stmt_cost_vec))
>  	    {
>  	      vect_free_oprnd_info (&oprnds_info);
>  	      return false;
> @@ -880,7 +886,6 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>      }
>  
>    /* Add the costs of the node to the overall instance costs.  */
> -  *inside_cost += SLP_TREE_INSIDE_OF_LOOP_COST (*node);
>    *outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node);
>  
>    /* Grouped loads were reached - stop the recursion.  */
> @@ -889,11 +894,10 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>        VEC_safe_push (slp_tree, heap, *loads, *node);
>        if (permutation)
>          {
> -
> +	  gimple first_stmt = VEC_index (gimple, stmts, 0);
>            *loads_permuted = true;
> -          *inside_cost 
> -            += targetm.vectorize.builtin_vectorization_cost (vec_perm, NULL, 0) 
> -               * group_size;
> +	  (void) record_stmt_cost (stmt_cost_vec, group_size, vec_perm, 
> +				   vinfo_for_stmt (first_stmt), 0);
>          }
>        else
>          {
> @@ -919,9 +923,10 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>        child = vect_create_new_slp_node (oprnd_info->def_stmts);
>        if (!child
>            || !vect_build_slp_tree (loop_vinfo, bb_vinfo, &child, group_size,
> -				inside_cost, outside_cost, ncopies_for_cost,
> -				max_nunits, load_permutation, loads,
> -				vectorization_factor, loads_permuted))
> +				   outside_cost, ncopies_for_cost,
> +				   max_nunits, load_permutation, loads,
> +				   vectorization_factor, loads_permuted,
> +				   stmt_cost_vec))
>          {
>  	  if (child)
>  	    oprnd_info->def_stmts = NULL;
> @@ -1459,13 +1464,14 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>    tree vectype, scalar_type = NULL_TREE;
>    gimple next;
>    unsigned int vectorization_factor = 0;
> -  int inside_cost = 0, outside_cost = 0, ncopies_for_cost, i;
> +  int outside_cost = 0, ncopies_for_cost, i;
>    unsigned int max_nunits = 0;
>    VEC (int, heap) *load_permutation;
>    VEC (slp_tree, heap) *loads;
>    struct data_reference *dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
>    bool loads_permuted = false;
>    VEC (gimple, heap) *scalar_stmts;
> +  stmt_vector_for_cost stmt_cost_vec;
>  
>    if (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)))
>      {
> @@ -1551,12 +1557,14 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>  
>    load_permutation = VEC_alloc (int, heap, group_size * group_size);
>    loads = VEC_alloc (slp_tree, heap, group_size);
> +  stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
>  
>    /* Build the tree for the SLP instance.  */
>    if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &node, group_size,
> -                           &inside_cost, &outside_cost, ncopies_for_cost,
> +                           &outside_cost, ncopies_for_cost,
>  			   &max_nunits, &load_permutation, &loads,
> -			   vectorization_factor, &loads_permuted))
> +			   vectorization_factor, &loads_permuted,
> +			   &stmt_cost_vec))
>      {
>        /* Calculate the unrolling factor based on the smallest type.  */
>        if (max_nunits > nunits)
> @@ -1568,6 +1576,7 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>            if (vect_print_dump_info (REPORT_SLP))
>              fprintf (vect_dump, "Build SLP failed: unrolling required in basic"
>                                 " block SLP");
> +	  VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
>            return false;
>          }
>  
> @@ -1577,7 +1586,7 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>        SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
>        SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
>        SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost;
> -      SLP_INSTANCE_INSIDE_OF_LOOP_COST (new_instance) = inside_cost;
> +      SLP_INSTANCE_STMT_COST_VEC (new_instance) = stmt_cost_vec;
>        SLP_INSTANCE_LOADS (new_instance) = loads;
>        SLP_INSTANCE_FIRST_LOAD_STMT (new_instance) = NULL;
>        SLP_INSTANCE_LOAD_PERMUTATION (new_instance) = load_permutation;
> @@ -1617,6 +1626,8 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>  
>        return true;
>      }
> +  else
> +    VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
>  
>    /* Failed to SLP.  */
>    /* Free the allocated memory.  */
> @@ -1812,6 +1823,7 @@ new_bb_vec_info (basic_block bb)
>  
>    BB_VINFO_GROUPED_STORES (res) = VEC_alloc (gimple, heap, 10);
>    BB_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 2);
> +  BB_VINFO_TARGET_COST_DATA (res) = init_cost (NULL);
>  
>    bb->aux = res;
>    return res;
> @@ -1846,6 +1858,7 @@ destroy_bb_vec_info (bb_vec_info bb_vinfo)
>    free_dependence_relations (BB_VINFO_DDRS (bb_vinfo));
>    VEC_free (gimple, heap, BB_VINFO_GROUPED_STORES (bb_vinfo));
>    VEC_free (slp_instance, heap, BB_VINFO_SLP_INSTANCES (bb_vinfo));
> +  destroy_cost_data (BB_VINFO_TARGET_COST_DATA (bb_vinfo));
>    free (bb_vinfo);
>    bb->aux = NULL;
>  }
> @@ -1918,8 +1931,8 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb
>  {
>    VEC (slp_instance, heap) *slp_instances = BB_VINFO_SLP_INSTANCES (bb_vinfo);
>    slp_instance instance;
> -  int i;
> -  unsigned int vec_outside_cost = 0, vec_inside_cost = 0, scalar_cost = 0;
> +  int i, j;
> +  unsigned int vec_inside_cost = 0, vec_outside_cost = 0, scalar_cost = 0;
>    unsigned int stmt_cost;
>    gimple stmt;
>    gimple_stmt_iterator si;
> @@ -1927,12 +1940,19 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb
>    stmt_vec_info stmt_info = NULL;
>    tree dummy_type = NULL;
>    int dummy = 0;
> +  stmt_vector_for_cost stmt_cost_vec;
> +  stmt_info_for_cost *ci;
>  
>    /* Calculate vector costs.  */
>    FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
>      {
>        vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
> -      vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
> +      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
> +
> +      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, ci)
> +	(void) add_stmt_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo),
> +			      ci->count, ci->kind,
> +			      vinfo_for_stmt (ci->stmt), ci->misalign);
>      }
>  
>    /* Calculate scalar cost.  */
> @@ -1961,6 +1981,9 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb
>        scalar_cost += stmt_cost;
>      }
>  
> +  /* Complete the target-specific cost calculation.  */
> +  vec_inside_cost = finish_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo));
> +
>    if (vect_print_dump_info (REPORT_COST))
>      {
>        fprintf (vect_dump, "Cost model analysis: \n");
> @@ -2072,7 +2095,7 @@ vect_slp_analyze_bb_1 (basic_block bb)
>        vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance));
>      }
>  
> -   if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
> +  if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
>      {
>        if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
>          fprintf (vect_dump, "not vectorized: unsupported alignment in basic "
> @@ -2175,17 +2198,30 @@ vect_slp_analyze_bb (basic_block bb)
>  void
>  vect_update_slp_costs_according_to_vf (loop_vec_info loop_vinfo)
>  {
> -  unsigned int i, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +  unsigned int i, j, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>    VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
>    slp_instance instance;
> +  stmt_vector_for_cost stmt_cost_vec;
> +  stmt_info_for_cost *si;
>  
>    if (vect_print_dump_info (REPORT_SLP))
>      fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ===");
>  
>    FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
> -    /* We assume that costs are linear in ncopies.  */
> -    SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf
> -      / SLP_INSTANCE_UNROLLING_FACTOR (instance);
> +    {
> +      /* We assume that costs are linear in ncopies.  */
> +      int ncopies = vf / SLP_INSTANCE_UNROLLING_FACTOR (instance);
> +
> +      /* Record the instance's instructions in the target cost model.
> +	 This was delayed until here because the count of instructions
> +	 isn't known beforehand.  */
> +      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
> +
> +      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, si)
> +	(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> +			      si->count * ncopies, si->kind,
> +			      vinfo_for_stmt (si->stmt), si->misalign);
> +    }
>  }
>  
>  
> 
> 
>

Patch

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 189292)
+++ gcc/doc/tm.texi	(working copy)
@@ -5792,6 +5792,22 @@  mode returned by @code{TARGET_VECTORIZE_PREFERRED_
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
+@deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info})
+This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates an unsigned integer for accumulating a single cost.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
+@end deftypefn
+
+@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (void *@var{data}, int @var{count}, enum vect_cost_for_stmt @var{kind}, struct _stmt_vec_info *@var{stmt_info}, int @var{misalign})
+This hook should update the target-specific @var{data} in response to adding @var{count} copies of the given @var{kind} of statement to the body of a loop or basic block.  The default adds the builtin vectorizer cost for the copies of the statement to the accumulator, and returns the amount added.  The return value should be viewed as a tentative cost that may later be overridden.
+@end deftypefn
+
+@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_FINISH_COST (void *@var{data})
+This hook should complete calculations of the cost of vectorizing a loop or basic block based on @var{data}, and return that cost as an unsigned integer.  The default returns the value of the accumulator.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_VECTORIZE_DESTROY_COST_DATA (void *@var{data})
+This hook should release @var{data} and any related data structures allocated by TARGET_VECTORIZE_INIT_COST.  The default releases the accumulator.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_TM_LOAD (tree)
 This hook should return the built-in decl needed to load a vector of the given type within a transaction.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 189292)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5724,6 +5724,14 @@  mode returned by @code{TARGET_VECTORIZE_PREFERRED_
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_INIT_COST
+
+@hook TARGET_VECTORIZE_ADD_STMT_COST
+
+@hook TARGET_VECTORIZE_FINISH_COST
+
+@hook TARGET_VECTORIZE_DESTROY_COST_DATA
+
 @hook TARGET_VECTORIZE_BUILTIN_TM_LOAD
 
 @hook TARGET_VECTORIZE_BUILTIN_TM_STORE
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 189292)
+++ gcc/targhooks.c	(working copy)
@@ -996,6 +996,64 @@  default_autovectorize_vector_sizes (void)
   return 0;
 }
 
+/* By default, the cost model just accumulates the inside_loop costs for
+   a vectorized loop or block.  So allocate an unsigned int, set it to
+   zero, and return its address.  */
+
+void *
+default_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
+{
+  unsigned *cost = XNEW (unsigned);
+  *cost = 0;
+  return cost;
+}
+
+/* By default, the cost model looks up the cost of the given statement
+   kind and mode, multiplies it by the occurrence count, accumulates
+   it into the cost, and returns the cost added.  */
+
+unsigned
+default_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
+		       struct _stmt_vec_info *stmt_info, int misalign)
+{
+  unsigned *cost = (unsigned *) data;
+  unsigned retval = 0;
+
+  if (flag_vect_cost_model)
+    {
+      tree vectype = stmt_vectype (stmt_info);
+      int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
+							  misalign);
+      /* Statements in an inner loop relative to the loop being
+	 vectorized are weighted more heavily.  The value here is
+	 arbitrary and could potentially be improved with analysis.  */
+      if (stmt_in_inner_loop_p (stmt_info))
+	count *= 50;  /* FIXME.  */
+
+      retval = (unsigned) (count * stmt_cost);
+      *cost += retval;
+    }
+
+  return retval;
+}
+
+/* By default, the cost model just returns the accumulated
+   inside_loop cost.  */
+
+unsigned
+default_finish_cost (void *data)
+{
+  return *((unsigned *) data);
+}
+
+/* Free the cost data.  */
+
+void
+default_destroy_cost_data (void *data)
+{
+  free (data);
+}
+
 /* Determine whether or not a pointer mode is valid. Assume defaults
    of ptr_mode or Pmode - can be overridden.  */
 bool
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 189292)
+++ gcc/targhooks.h	(working copy)
@@ -90,6 +90,11 @@  default_builtin_support_vector_misalignment (enum
 					     int, bool);
 extern enum machine_mode default_preferred_simd_mode (enum machine_mode mode);
 extern unsigned int default_autovectorize_vector_sizes (void);
+extern void *default_init_cost (struct loop *);
+extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
+				       struct _stmt_vec_info *, int);
+extern unsigned default_finish_cost (void *);
+extern void default_destroy_cost_data (void *);
 
 /* These are here, and not in hooks.[ch], because not all users of
    hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 189292)
+++ gcc/target.def	(working copy)
@@ -1063,6 +1063,55 @@  DEFHOOK
  (const_tree mem_vectype, const_tree index_type, int scale),
  NULL)
 
+/* Target function to initialize the cost model for a loop or block.  */
+DEFHOOK
+(init_cost,
+ "This hook should initialize target-specific data structures in preparation "
+ "for modeling the costs of vectorizing a loop or basic block.  The default "
+ "allocates an unsigned integer for accumulating a single cost.  "
+ "If @var{loop_info} is non-NULL, it identifies the loop being vectorized; "
+ "otherwise a single block is being vectorized.",
+ void *,
+ (struct loop *loop_info),
+ default_init_cost)
+
+/* Target function to record N statements of the given kind using the
+   given vector type within the cost model data for the current loop
+   or block.  */
+DEFHOOK
+(add_stmt_cost,
+ "This hook should update the target-specific @var{data} in response to "
+ "adding @var{count} copies of the given @var{kind} of statement to the "
+ "body of a loop or basic block.  The default adds the builtin vectorizer "
+ "cost for the copies of the statement to the accumulator, and returns "
+ "the amount added.  The return value should be viewed as a tentative "
+ "cost that may later be overridden.",
+ unsigned,
+ (void *data, int count, enum vect_cost_for_stmt kind,
+  struct _stmt_vec_info *stmt_info, int misalign),
+ default_add_stmt_cost)
+
+/* Target function to calculate the total cost of the current vectorized
+   loop or block.  */
+DEFHOOK
+(finish_cost,
+ "This hook should complete calculations of the cost of vectorizing a loop "
+ "or basic block based on @var{data}, and return that cost as an unsigned "
+ "integer.  The default returns the value of the accumulator.",
+ unsigned,
+ (void *data),
+ default_finish_cost)
+
+/* Function to delete target-specific cost modeling data.  */
+DEFHOOK
+(destroy_cost_data,
+ "This hook should release @var{data} and any related data structures "
+ "allocated by TARGET_VECTORIZE_INIT_COST.  The default releases the "
+ "accumulator.",
+ void,
+ (void *data),
+ default_destroy_cost_data)
+
 HOOK_VECTOR_END (vectorize)
 
 #undef HOOK_PREFIX
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 189292)
+++ gcc/target.h	(working copy)
@@ -120,6 +120,13 @@  struct loop;
 /* This is defined in tree-ssa-alias.h.  */
 struct ao_ref_s;
 
+/* This is defined in tree-vectorizer.h.  */
+struct _stmt_vec_info;
+
+/* These are defined in tree-vect-stmts.c.  */
+extern tree stmt_vectype (struct _stmt_vec_info *);
+extern bool stmt_in_inner_loop_p (struct _stmt_vec_info *);
+
 /* Assembler instructions for creating various kinds of integer object.  */
 
 struct asm_int_op
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	(revision 189292)
+++ gcc/tree-vectorizer.h	(working copy)
@@ -71,6 +71,32 @@  enum vect_def_type {
                                    || ((D) == vect_double_reduction_def) \
                                    || ((D) == vect_nested_cycle))
 
+/* Structure to encapsulate information about a group of like
+   instructions to be presented to the target cost model.  */
+typedef struct _stmt_info_for_cost {
+  int count;
+  enum vect_cost_for_stmt kind;
+  gimple stmt;
+  int misalign;
+} stmt_info_for_cost;
+
+DEF_VEC_O (stmt_info_for_cost);
+DEF_VEC_ALLOC_O (stmt_info_for_cost, heap);
+
+typedef VEC(stmt_info_for_cost, heap) *stmt_vector_for_cost;
+
+static inline void
+add_stmt_info_to_vec (stmt_vector_for_cost *stmt_cost_vec, int count,
+		      enum vect_cost_for_stmt kind, gimple stmt, int misalign)
+{
+  stmt_info_for_cost si;
+  si.count = count;
+  si.kind = kind;
+  si.stmt = stmt;
+  si.misalign = misalign;
+  VEC_safe_push (stmt_info_for_cost, heap, *stmt_cost_vec, &si);
+}
+
 /************************************************************************
   SLP
  ************************************************************************/
@@ -96,7 +122,6 @@  typedef struct _slp_tree {
   struct
   {
     int outside_of_loop;     /* Statements generated outside loop.  */
-    int inside_of_loop;      /* Statements generated inside loop.  */
   } cost;
 } *slp_tree;
 
@@ -119,9 +144,11 @@  typedef struct _slp_instance {
   struct
   {
     int outside_of_loop;     /* Statements generated outside loop.  */
-    int inside_of_loop;      /* Statements generated inside loop.  */
   } cost;
 
+  /* Inside-loop costs.  */
+  stmt_vector_for_cost stmt_cost_vec;
+
   /* Loads permutation relatively to the stores, NULL if there is no
      permutation.  */
   VEC (int, heap) *load_permutation;
@@ -142,7 +169,7 @@  DEF_VEC_ALLOC_P(slp_instance, heap);
 #define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
 #define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
 #define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S)     (S)->cost.outside_of_loop
-#define SLP_INSTANCE_INSIDE_OF_LOOP_COST(S)      (S)->cost.inside_of_loop
+#define SLP_INSTANCE_STMT_COST_VEC(S)            (S)->stmt_cost_vec
 #define SLP_INSTANCE_LOAD_PERMUTATION(S)         (S)->load_permutation
 #define SLP_INSTANCE_LOADS(S)                    (S)->loads
 #define SLP_INSTANCE_FIRST_LOAD_STMT(S)          (S)->first_load
@@ -152,7 +179,6 @@  DEF_VEC_ALLOC_P(slp_instance, heap);
 #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
 #define SLP_TREE_OUTSIDE_OF_LOOP_COST(S)         (S)->cost.outside_of_loop
-#define SLP_TREE_INSIDE_OF_LOOP_COST(S)          (S)->cost.inside_of_loop
 
 /* This structure is used in creation of an SLP tree.  Each instance
    corresponds to the same operand in a group of scalar stmts in an SLP
@@ -186,6 +212,7 @@  typedef struct _vect_peel_extended_info
   struct _vect_peel_info peel_info;
   unsigned int inside_cost;
   unsigned int outside_cost;
+  stmt_vector_for_cost stmt_cost_vec;
 } *vect_peel_extended_info;
 
 /*-----------------------------------------------------------------*/
@@ -274,6 +301,9 @@  typedef struct _loop_vec_info {
   /* Hash table used to choose the best peeling option.  */
   htab_t peeling_htab;
 
+  /* Cost data used by the target cost model.  */
+  void *target_cost_data;
+
   /* When we have grouped data accesses with gaps, we may introduce invalid
      memory accesses.  We peel the last iteration of the loop to prevent
      this.  */
@@ -307,6 +337,7 @@  typedef struct _loop_vec_info {
 #define LOOP_VINFO_REDUCTIONS(L)           (L)->reductions
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_HTAB(L)         (L)->peeling_htab
+#define LOOP_VINFO_TARGET_COST_DATA(L)     (L)->target_cost_data
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
@@ -350,13 +381,18 @@  typedef struct _bb_vec_info {
 
   /* All data dependences in the basic block.  */
   VEC (ddr_p, heap) *ddrs;
+
+  /* Cost data used by the target cost model.  */
+  void *target_cost_data;
+
 } *bb_vec_info;
 
-#define BB_VINFO_BB(B)              (B)->bb
-#define BB_VINFO_GROUPED_STORES(B)  (B)->grouped_stores
-#define BB_VINFO_SLP_INSTANCES(B)   (B)->slp_instances
-#define BB_VINFO_DATAREFS(B)        (B)->datarefs
-#define BB_VINFO_DDRS(B)            (B)->ddrs
+#define BB_VINFO_BB(B)               (B)->bb
+#define BB_VINFO_GROUPED_STORES(B)   (B)->grouped_stores
+#define BB_VINFO_SLP_INSTANCES(B)    (B)->slp_instances
+#define BB_VINFO_DATAREFS(B)         (B)->datarefs
+#define BB_VINFO_DDRS(B)             (B)->ddrs
+#define BB_VINFO_TARGET_COST_DATA(B) (B)->target_cost_data
 
 static inline bb_vec_info
 vec_info_for_bb (basic_block bb)
@@ -534,7 +570,6 @@  typedef struct _stmt_vec_info {
   struct
   {
     int outside_of_loop;     /* Statements generated outside loop.  */
-    int inside_of_loop;      /* Statements generated inside loop.  */
   } cost;
 
   /* The bb_vec_info with respect to which STMT is vectorized.  */
@@ -594,7 +629,6 @@  typedef struct _stmt_vec_info {
 
 #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_scope)
 #define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
-#define STMT_VINFO_INSIDE_OF_LOOP_COST(S)  (S)->cost.inside_of_loop
 
 #define HYBRID_SLP_STMT(S)                ((S)->slp_type == hybrid)
 #define PURE_SLP_STMT(S)                  ((S)->slp_type == pure_slp)
@@ -733,21 +767,9 @@  is_loop_header_bb_p (basic_block bb)
   return false;
 }
 
-/* Set inside loop vectorization cost.  */
+/* Set outside loop vectorization cost.  */
 
 static inline void
-stmt_vinfo_set_inside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node,
-				    int cost)
-{
-  if (slp_node)
-    SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
-  else
-    STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
-}
-
-/* Set inside loop vectorization cost.  */
-
-static inline void
 stmt_vinfo_set_outside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node,
 				     int cost)
 {
@@ -782,6 +804,41 @@  int vect_get_stmt_cost (enum vect_cost_for_stmt ty
                                                        dummy_type, dummy);
 }
 
+/* Alias targetm.vectorize.init_cost.  */
+
+static inline void *
+init_cost (struct loop *loop_info)
+{
+  return targetm.vectorize.init_cost (loop_info);
+}
+
+/* Alias targetm.vectorize.add_stmt_cost.  */
+
+static inline unsigned
+add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
+	       stmt_vec_info stmt_info, int misalign)
+{
+  return targetm.vectorize.add_stmt_cost (data, count, kind,
+					  stmt_info, misalign);
+}
+
+/* Alias targetm.vectorize.finish_cost.  */
+
+static inline unsigned
+finish_cost (void *data)
+{
+  return targetm.vectorize.finish_cost (data);
+}
+
+/* Alias targetm.vectorize.destroy_cost_data.  */
+
+static inline void
+destroy_cost_data (void *data)
+{
+  targetm.vectorize.destroy_cost_data (data);
+}
+
+
 /*-----------------------------------------------------------------*/
 /* Info on data references alignment.                              */
 /*-----------------------------------------------------------------*/
@@ -849,10 +906,14 @@  extern stmt_vec_info new_stmt_vec_info (gimple stm
 extern void free_stmt_vec_info (gimple stmt);
 extern tree vectorizable_function (gimple, tree, tree);
 extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
-                                    slp_tree);
+                                    slp_tree, stmt_vector_for_cost *);
 extern void vect_model_store_cost (stmt_vec_info, int, bool,
-				   enum vect_def_type, slp_tree);
-extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree);
+				   enum vect_def_type, slp_tree,
+				   stmt_vector_for_cost *);
+extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
+				  stmt_vector_for_cost *);
+extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
+				  enum vect_cost_for_stmt, stmt_vec_info, int);
 extern void vect_finish_stmt_generation (gimple, gimple,
                                          gimple_stmt_iterator *);
 extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
@@ -867,8 +928,10 @@  extern bool vect_analyze_stmt (gimple, bool *, slp
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
                                     tree, int, slp_tree);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
-                                unsigned int *, unsigned int *);
-extern void vect_get_store_cost (struct data_reference *, int, unsigned int *);
+				unsigned int *, unsigned int *,
+				stmt_vector_for_cost *);
+extern void vect_get_store_cost (struct data_reference *, int,
+				 unsigned int *, stmt_vector_for_cost *);
 extern bool vect_supportable_shift (enum tree_code, tree);
 extern void vect_get_vec_defs (tree, tree, gimple, VEC (tree, heap) **,
 			       VEC (tree, heap) **, slp_tree, int);
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 189292)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -852,6 +852,7 @@  new_loop_vec_info (struct loop *loop)
   LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
   LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
   LOOP_VINFO_PEELING_HTAB (res) = NULL;
+  LOOP_VINFO_TARGET_COST_DATA (res) = init_cost (loop);
   LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
 
   return res;
@@ -929,6 +930,8 @@  destroy_loop_vec_info (loop_vec_info loop_vinfo, b
   if (LOOP_VINFO_PEELING_HTAB (loop_vinfo))
     htab_delete (LOOP_VINFO_PEELING_HTAB (loop_vinfo));
 
+  destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
+
   free (loop_vinfo);
   loop->aux = NULL;
 }
@@ -1362,7 +1365,7 @@  vect_analyze_loop_operations (loop_vec_info loop_v
                            "not vectorized: relevant phi not supported: ");
                   print_gimple_stmt (vect_dump, phi, 0, TDF_SLIM);
                 }
-              return false;
+	      return false;
             }
         }
 
@@ -2498,7 +2501,6 @@  vect_estimate_min_profitable_iters (loop_vec_info
   int nbbs = loop->num_nodes;
   int npeel = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
   int peel_guard_costs = 0;
-  int innerloop_iters = 0, factor;
   VEC (slp_instance, heap) *slp_instances;
   slp_instance instance;
 
@@ -2544,20 +2546,11 @@  vect_estimate_min_profitable_iters (loop_vec_info
      TODO: Consider assigning different costs to different scalar
      statements.  */
 
-  /* FORNOW.  */
-  if (loop->inner)
-    innerloop_iters = 50; /* FIXME */
-
   for (i = 0; i < nbbs; i++)
     {
       gimple_stmt_iterator si;
       basic_block bb = bbs[i];
 
-      if (bb->loop_father == loop->inner)
- 	factor = innerloop_iters;
-      else
- 	factor = 1;
-
       for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
 	{
 	  gimple stmt = gsi_stmt (si);
@@ -2575,7 +2568,6 @@  vect_estimate_min_profitable_iters (loop_vec_info
                  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))))
 	    continue;
 
-	  vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor;
 	  /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
 	     some of the "outside" costs are generated inside the outer-loop.  */
 	  vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
@@ -2592,14 +2584,9 @@  vect_estimate_min_profitable_iters (loop_vec_info
 		    = vinfo_for_stmt (pattern_def_stmt);
                   if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)
                       || STMT_VINFO_LIVE_P (pattern_def_stmt_info))
-		    {
-                      vec_inside_cost
-			+= STMT_VINFO_INSIDE_OF_LOOP_COST
-			   (pattern_def_stmt_info) * factor;
-                      vec_outside_cost
-			+= STMT_VINFO_OUTSIDE_OF_LOOP_COST
-			   (pattern_def_stmt_info);
-                    }
+		    vec_outside_cost
+		      += STMT_VINFO_OUTSIDE_OF_LOOP_COST
+		        (pattern_def_stmt_info);
 		}
 	    }
 	}
@@ -2725,11 +2712,12 @@  vect_estimate_min_profitable_iters (loop_vec_info
   /* Add SLP costs.  */
   slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
   FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
-    {
-      vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
-      vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
-    }
+    vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
 
+  /* Complete the target-specific cost calculation for the inside-of-loop
+     costs.  */
+  vec_inside_cost = finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
+  
   /* Calculate number of iterations required to make the vector version
      profitable, relative to the loop bodies only.  The following condition
      must hold true:
@@ -2826,10 +2814,10 @@  vect_model_reduction_cost (stmt_vec_info stmt_info
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 
-
   /* Cost of reduction op inside loop.  */
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) 
-    += ncopies * vect_get_stmt_cost (vector_stmt);
+  unsigned inside_cost
+    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
+		     ncopies, vector_stmt, stmt_info, 0);
 
   stmt = STMT_VINFO_STMT (stmt_info);
 
@@ -2915,7 +2903,7 @@  vect_model_reduction_cost (stmt_vec_info stmt_info
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_reduction_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             "outside_cost = %d .", inside_cost,
              STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
 
   return true;
@@ -2929,16 +2917,20 @@  vect_model_reduction_cost (stmt_vec_info stmt_info
 static void
 vect_model_induction_cost (stmt_vec_info stmt_info, int ncopies)
 {
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+
   /* loop cost for vec_loop.  */
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) 
-    = ncopies * vect_get_stmt_cost (vector_stmt);
+  unsigned inside_cost
+    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), ncopies,
+		     vector_stmt, stmt_info, 0);
+
   /* prologue cost for vec_init and vec_step.  */
   STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)  
     = 2 * vect_get_stmt_cost (scalar_to_vec);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_induction_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             "outside_cost = %d .", inside_cost,
              STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
 }
 
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	(revision 189292)
+++ gcc/tree-vect-data-refs.c	(working copy)
@@ -1205,7 +1205,7 @@  vector_alignment_reachable_p (struct data_referenc
 
 /* Calculate the cost of the memory access represented by DR.  */
 
-static void
+static stmt_vector_for_cost
 vect_get_data_access_cost (struct data_reference *dr,
                            unsigned int *inside_cost,
                            unsigned int *outside_cost)
@@ -1216,15 +1216,19 @@  vect_get_data_access_cost (struct data_reference *
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   int ncopies = vf / nunits;
+  stmt_vector_for_cost stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
 
   if (DR_IS_READ (dr))
-    vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost);
+    vect_get_load_cost (dr, ncopies, true, inside_cost,
+			outside_cost, &stmt_cost_vec);
   else
-    vect_get_store_cost (dr, ncopies, inside_cost);
+    vect_get_store_cost (dr, ncopies, inside_cost, &stmt_cost_vec);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
              "outside_cost = %d.", *inside_cost, *outside_cost);
+
+  return stmt_cost_vec;
 }
 
 
@@ -1317,6 +1321,7 @@  vect_peeling_hash_get_lowest_cost (void **slot, vo
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   VEC (data_reference_p, heap) *datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   struct data_reference *dr;
+  stmt_vector_for_cost stmt_cost_vec = NULL;
 
   FOR_EACH_VEC_ELT (data_reference_p, datarefs, i, dr)
     {
@@ -1330,7 +1335,8 @@  vect_peeling_hash_get_lowest_cost (void **slot, vo
 
       save_misalignment = DR_MISALIGNMENT (dr);
       vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
-      vect_get_data_access_cost (dr, &inside_cost, &outside_cost);
+      stmt_cost_vec = vect_get_data_access_cost (dr, &inside_cost,
+						 &outside_cost);
       SET_DR_MISALIGNMENT (dr, save_misalignment);
     }
 
@@ -1342,6 +1348,7 @@  vect_peeling_hash_get_lowest_cost (void **slot, vo
     {
       min->inside_cost = inside_cost;
       min->outside_cost = outside_cost;
+      min->stmt_cost_vec = stmt_cost_vec;
       min->peel_info.dr = elem->dr;
       min->peel_info.npeel = elem->npeel;
     }
@@ -1356,11 +1363,13 @@  vect_peeling_hash_get_lowest_cost (void **slot, vo
 
 static struct data_reference *
 vect_peeling_hash_choose_best_peeling (loop_vec_info loop_vinfo,
-                                       unsigned int *npeel)
+                                       unsigned int *npeel,
+				       stmt_vector_for_cost *stmt_cost_vec)
 {
    struct _vect_peel_extended_info res;
 
    res.peel_info.dr = NULL;
+   res.stmt_cost_vec = NULL;
 
    if (flag_vect_cost_model)
      {
@@ -1377,6 +1386,7 @@  vect_peeling_hash_choose_best_peeling (loop_vec_in
      }
 
    *npeel = res.peel_info.npeel;
+   *stmt_cost_vec = res.stmt_cost_vec;
    return res.peel_info.dr;
 }
 
@@ -1493,6 +1503,7 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
   unsigned possible_npeel_number = 1;
   tree vectype;
   unsigned int nelements, mis, same_align_drs_max = 0;
+  stmt_vector_for_cost stmt_cost_vec = NULL;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
@@ -1697,10 +1708,10 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
           unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
           unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
 
-          vect_get_data_access_cost (dr0, &load_inside_cost,
-                                     &load_outside_cost);
-          vect_get_data_access_cost (first_store, &store_inside_cost,
-                                     &store_outside_cost);
+          (void) vect_get_data_access_cost (dr0, &load_inside_cost,
+					    &load_outside_cost);
+          (void) vect_get_data_access_cost (first_store, &store_inside_cost,
+					    &store_outside_cost);
 
           /* Calculate the penalty for leaving FIRST_STORE unaligned (by
              aligning the load DR0).  */
@@ -1764,7 +1775,8 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
       gcc_assert (!all_misalignments_unknown);
 
       /* Choose the best peeling from the hash table.  */
-      dr0 = vect_peeling_hash_choose_best_peeling (loop_vinfo, &npeel);
+      dr0 = vect_peeling_hash_choose_best_peeling (loop_vinfo, &npeel,
+						   &stmt_cost_vec);
       if (!dr0 || !npeel)
         do_peeling = false;
     }
@@ -1848,6 +1860,8 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
 
       if (do_peeling)
         {
+	  stmt_info_for_cost *si;
+
           /* (1.2) Update the DR_MISALIGNMENT of each data reference DR_i.
              If the misalignment of DR_i is identical to that of dr0 then set
              DR_MISALIGNMENT (DR_i) to zero.  If the misalignment of DR_i and
@@ -1871,6 +1885,18 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "Peeling for alignment will be applied.");
 
+	  /* We've delayed passing the inside-loop peeling costs to the
+	     target cost model until we were sure peeling would happen.
+	     Do so now.  */
+	  if (stmt_cost_vec)
+	    {
+	      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, i, si)
+		(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
+				      si->count, si->kind,
+				      vinfo_for_stmt (si->stmt), si->misalign);
+	      VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
+	    }
+
 	  stat = vect_verify_datarefs_alignment (loop_vinfo, NULL);
 	  gcc_assert (stat);
           return stat;
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	(revision 189292)
+++ gcc/tree-vect-stmts.c	(working copy)
@@ -41,6 +41,66 @@  along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 
 
+/* Return the vectorized type for the given statement.  */
+
+tree
+stmt_vectype (struct _stmt_vec_info *stmt_info)
+{
+  return STMT_VINFO_VECTYPE (stmt_info);
+}
+
+/* Return TRUE iff the given statement is in an inner loop relative to
+   the loop being vectorized.  */
+bool
+stmt_in_inner_loop_p (struct _stmt_vec_info *stmt_info)
+{
+  gimple stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block bb = gimple_bb (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  struct loop* loop;
+
+  if (!loop_vinfo)
+    return false;
+
+  loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+  return (bb->loop_father == loop->inner);
+}
+
+/* Record the cost of a statement, either by directly informing the 
+   target model or by saving it in a vector for later processing.
+   Return a preliminary estimate of the statement's cost.  */
+
+unsigned
+record_stmt_cost (stmt_vector_for_cost *stmt_cost_vec, int count,
+		  enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
+		  int misalign)
+{
+  if (stmt_cost_vec)
+    {
+      tree vectype = stmt_vectype (stmt_info);
+      add_stmt_info_to_vec (stmt_cost_vec, count, kind,
+			    STMT_VINFO_STMT (stmt_info), misalign);
+      return (unsigned)
+	(targetm.vectorize.builtin_vectorization_cost (kind, vectype, misalign)
+	 * count);
+	 
+    }
+  else
+    {
+      loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+      bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+      void *target_cost_data;
+
+      if (loop_vinfo)
+	target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
+      else
+	target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
+
+      return add_stmt_cost (target_cost_data, count, kind, stmt_info, misalign);
+    }
+}
+
 /* Return a variable of type ELEM_TYPE[NELEMS].  */
 
 static tree
@@ -735,7 +795,8 @@  vect_mark_stmts_to_be_vectorized (loop_vec_info lo
 
 void
 vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies,
-			enum vect_def_type *dt, slp_tree slp_node)
+			enum vect_def_type *dt, slp_tree slp_node,
+			stmt_vector_for_cost *stmt_cost_vec)
 {
   int i;
   int inside_cost = 0, outside_cost = 0;
@@ -744,8 +805,6 @@  vect_model_simple_cost (stmt_vec_info stmt_info, i
   if (PURE_SLP_STMT (stmt_info))
     return;
 
-  inside_cost = ncopies * vect_get_stmt_cost (vector_stmt); 
-
   /* FORNOW: Assuming maximum 2 args per stmts.  */
   for (i = 0; i < 2; i++)
     {
@@ -753,13 +812,16 @@  vect_model_simple_cost (stmt_vec_info stmt_info, i
 	outside_cost += vect_get_stmt_cost (vector_stmt); 
     }
 
+  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
+  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
+
+  /* Pass the inside-of-loop statements to the target-specific cost model.  */
+  inside_cost = record_stmt_cost (stmt_cost_vec, ncopies, vector_stmt,
+				  stmt_info, 0);
+
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
              "outside_cost = %d .", inside_cost, outside_cost);
-
-  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
-  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
-  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
 }
 
 
@@ -773,18 +835,26 @@  vect_model_promotion_demotion_cost (stmt_vec_info
 				    enum vect_def_type *dt, int pwr)
 {
   int i, tmp;
-  int inside_cost = 0, outside_cost = 0, single_stmt_cost;
+  int inside_cost = 0, outside_cost = 0;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  void *target_cost_data;
 
   /* The SLP costs were already calculated during SLP tree build.  */
   if (PURE_SLP_STMT (stmt_info))
     return;
 
-  single_stmt_cost = vect_get_stmt_cost (vec_promote_demote);
+  if (loop_vinfo)
+    target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
+  else
+    target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
+
   for (i = 0; i < pwr + 1; i++)
     {
       tmp = (STMT_VINFO_TYPE (stmt_info) == type_promotion_vec_info_type) ?
 	(i + 1) : i;
-      inside_cost += vect_pow2 (tmp) * single_stmt_cost;
+      inside_cost += add_stmt_cost (target_cost_data, vect_pow2 (tmp),
+				    vec_promote_demote, stmt_info, 0);
     }
 
   /* FORNOW: Assuming maximum 2 args per stmts.  */
@@ -799,7 +869,6 @@  vect_model_promotion_demotion_cost (stmt_vec_info
              "outside_cost = %d .", inside_cost, outside_cost);
 
   /* Set the costs in STMT_INFO.  */
-  stmt_vinfo_set_inside_of_loop_cost (stmt_info, NULL, inside_cost);
   stmt_vinfo_set_outside_of_loop_cost (stmt_info, NULL, outside_cost);
 }
 
@@ -829,7 +898,7 @@  vect_cost_group_size (stmt_vec_info stmt_info)
 void
 vect_model_store_cost (stmt_vec_info stmt_info, int ncopies,
 		       bool store_lanes_p, enum vect_def_type dt,
-		       slp_tree slp_node)
+		       slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
 {
   int group_size;
   unsigned int inside_cost = 0, outside_cost = 0;
@@ -873,8 +942,10 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
   if (!store_lanes_p && group_size > 1)
     {
       /* Uses a high and low interleave operation for each needed permute.  */
-      inside_cost = ncopies * exact_log2(group_size) * group_size
-        * vect_get_stmt_cost (vec_perm);
+      
+      int nstmts = ncopies * exact_log2 (group_size) * group_size;
+      inside_cost = record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
+				      stmt_info, 0);
 
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
@@ -882,14 +953,13 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
     }
 
   /* Costs of the stores.  */
-  vect_get_store_cost (first_dr, ncopies, &inside_cost);
+  vect_get_store_cost (first_dr, ncopies, &inside_cost, stmt_cost_vec);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
              "outside_cost = %d .", inside_cost, outside_cost);
 
   /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
-  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
   stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
 }
 
@@ -897,15 +967,19 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
 /* Calculate cost of DR's memory access.  */
 void
 vect_get_store_cost (struct data_reference *dr, int ncopies,
-                     unsigned int *inside_cost)
+		     unsigned int *inside_cost,
+		     stmt_vector_for_cost *stmt_cost_vec)
 {
   int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
+  gimple stmt = DR_STMT (dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
 
   switch (alignment_support_scheme)
     {
     case dr_aligned:
       {
-        *inside_cost += ncopies * vect_get_stmt_cost (vector_store);
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+					  vector_store, stmt_info, 0);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_store_cost: aligned.");
@@ -915,14 +989,10 @@  vect_get_store_cost (struct data_reference *dr, in
 
     case dr_unaligned_supported:
       {
-        gimple stmt = DR_STMT (dr);
-        stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-        tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-
         /* Here, we assign an additional cost for the unaligned store.  */
-        *inside_cost += ncopies
-          * targetm.vectorize.builtin_vectorization_cost (unaligned_store,
-                                 vectype, DR_MISALIGNMENT (dr));
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+					  unaligned_store, stmt_info,
+					  DR_MISALIGNMENT (dr));
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_store_cost: unaligned supported by "
@@ -956,7 +1026,7 @@  vect_get_store_cost (struct data_reference *dr, in
 
 void
 vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, bool load_lanes_p,
-		      slp_tree slp_node)
+		      slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
 {
   int group_size;
   gimple first_stmt;
@@ -988,8 +1058,9 @@  vect_model_load_cost (stmt_vec_info stmt_info, int
   if (!load_lanes_p && group_size > 1)
     {
       /* Uses an even and odd extract operations for each needed permute.  */
-      inside_cost = ncopies * exact_log2(group_size) * group_size
-	* vect_get_stmt_cost (vec_perm);
+      int nstmts = ncopies * exact_log2 (group_size) * group_size;
+      inside_cost += record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
+				       stmt_info, 0);
 
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
@@ -1001,24 +1072,23 @@  vect_model_load_cost (stmt_vec_info stmt_info, int
     {
       /* N scalar loads plus gathering them into a vector.  */
       tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-      inside_cost += (vect_get_stmt_cost (scalar_load) * ncopies
-		      * TYPE_VECTOR_SUBPARTS (vectype));
-      inside_cost += ncopies
-	* targetm.vectorize.builtin_vectorization_cost (vec_construct,
-							vectype, 0);
+      inside_cost += record_stmt_cost (stmt_cost_vec,
+				       ncopies * TYPE_VECTOR_SUBPARTS (vectype),
+				       scalar_load, stmt_info, 0);
+      inside_cost += record_stmt_cost (stmt_cost_vec, ncopies, vec_construct,
+				       stmt_info, 0);
     }
   else
     vect_get_load_cost (first_dr, ncopies,
 			((!STMT_VINFO_GROUPED_ACCESS (stmt_info))
 			 || group_size > 1 || slp_node),
-			&inside_cost, &outside_cost);
+			&inside_cost, &outside_cost, stmt_cost_vec);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
              "outside_cost = %d .", inside_cost, outside_cost);
 
   /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
-  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
   stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
 }
 
@@ -1026,16 +1096,20 @@  vect_model_load_cost (stmt_vec_info stmt_info, int
 /* Calculate cost of DR's memory access.  */
 void
 vect_get_load_cost (struct data_reference *dr, int ncopies,
-                    bool add_realign_cost, unsigned int *inside_cost,
-                    unsigned int *outside_cost)
+		    bool add_realign_cost, unsigned int *inside_cost,
+		    unsigned int *outside_cost,
+		    stmt_vector_for_cost *stmt_cost_vec)
 {
   int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
+  gimple stmt = DR_STMT (dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
 
   switch (alignment_support_scheme)
     {
     case dr_aligned:
       {
-        *inside_cost += ncopies * vect_get_stmt_cost (vector_load); 
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+					  vector_load, stmt_info, 0);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_load_cost: aligned.");
@@ -1044,14 +1118,11 @@  vect_get_load_cost (struct data_reference *dr, int
       }
     case dr_unaligned_supported:
       {
-        gimple stmt = DR_STMT (dr);
-        stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-        tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+        /* Here, we assign an additional cost for the unaligned load.  */
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+					  unaligned_load, stmt_info,
+					  DR_MISALIGNMENT (dr));
 
-        /* Here, we assign an additional cost for the unaligned load.  */
-        *inside_cost += ncopies
-          * targetm.vectorize.builtin_vectorization_cost (unaligned_load,
-                                           vectype, DR_MISALIGNMENT (dr));
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_load_cost: unaligned supported by "
                    "hardware.");
@@ -1060,14 +1131,17 @@  vect_get_load_cost (struct data_reference *dr, int
       }
     case dr_explicit_realign:
       {
-        *inside_cost += ncopies * (2 * vect_get_stmt_cost (vector_load)
-				   + vect_get_stmt_cost (vec_perm));
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies * 2,
+					  vector_load, stmt_info, 0);
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+					  vec_perm, stmt_info, 0);
 
         /* FIXME: If the misalignment remains fixed across the iterations of
            the containing loop, the following cost should be added to the
            outside costs.  */
         if (targetm.vectorize.builtin_mask_for_load)
-          *inside_cost += vect_get_stmt_cost (vector_stmt);
+	  *inside_cost += record_stmt_cost (stmt_cost_vec, 1, vector_stmt,
+					    stmt_info, 0);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_load_cost: explicit realign");
@@ -1094,8 +1168,10 @@  vect_get_load_cost (struct data_reference *dr, int
               *outside_cost += vect_get_stmt_cost (vector_stmt);
           }
 
-        *inside_cost += ncopies * (vect_get_stmt_cost (vector_load)
-				   + vect_get_stmt_cost (vec_perm));
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+					  vector_load, stmt_info, 0);
+	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+					  vec_perm, stmt_info, 0);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump,
@@ -1719,7 +1795,7 @@  vectorizable_call (gimple stmt, gimple_stmt_iterat
       STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_call ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
       return true;
     }
 
@@ -2433,7 +2509,7 @@  vectorizable_conversion (gimple stmt, gimple_stmt_
       if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR)
         {
 	  STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
-	  vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
+	  vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
 	}
       else if (modifier == NARROW)
 	{
@@ -2841,7 +2917,7 @@  vectorizable_assignment (gimple stmt, gimple_stmt_
       STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_assignment ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
       return true;
     }
 
@@ -3187,7 +3263,7 @@  vectorizable_shift (gimple stmt, gimple_stmt_itera
       STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_shift ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
       return true;
     }
 
@@ -3565,7 +3641,7 @@  vectorizable_operation (gimple stmt, gimple_stmt_i
       STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_operation ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
       return true;
     }
 
@@ -3938,7 +4014,7 @@  vectorizable_store (gimple stmt, gimple_stmt_itera
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
-      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL);
+      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL, NULL);
       return true;
     }
 
@@ -4494,7 +4570,7 @@  vectorizable_load (gimple stmt, gimple_stmt_iterat
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
-      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL);
+      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL, NULL);
       return true;
     }
 
@@ -5934,7 +6010,6 @@  new_stmt_vec_info (gimple stmt, loop_vec_info loop
     STMT_VINFO_DEF_TYPE (res) = vect_internal_def;
 
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
-  STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
   STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
   STMT_SLP_TYPE (res) = loop_vect;
   GROUP_FIRST_ELEMENT (res) = NULL;
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	(revision 189292)
+++ gcc/config/spu/spu.c	(working copy)
@@ -443,6 +443,18 @@  static void spu_setup_incoming_varargs (cumulative
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST spu_builtin_vectorization_cost
 
+#undef TARGET_VECTORIZE_INIT_COST
+#define TARGET_VECTORIZE_INIT_COST spu_init_cost
+
+#undef TARGET_VECTORIZE_ADD_STMT_COST
+#define TARGET_VECTORIZE_ADD_STMT_COST spu_add_stmt_cost
+
+#undef TARGET_VECTORIZE_FINISH_COST
+#define TARGET_VECTORIZE_FINISH_COST spu_finish_cost
+
+#undef TARGET_VECTORIZE_DESTROY_COST_DATA
+#define TARGET_VECTORIZE_DESTROY_COST_DATA spu_destroy_cost_data
+
 #undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
 #define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE spu_vector_alignment_reachable
 
@@ -6947,6 +6959,59 @@  spu_builtin_vectorization_cost (enum vect_cost_for
     }
 }
 
+/* Implement targetm.vectorize.init_cost.  */
+
+void *
+spu_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
+{
+  unsigned *cost = XNEW (unsigned);
+  *cost = 0;
+  return cost;
+}
+
+/* Implement targetm.vectorize.add_stmt_cost.  */
+
+unsigned
+spu_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
+		   struct _stmt_vec_info *stmt_info, int misalign)
+{
+  unsigned *cost = (unsigned *) data;
+  unsigned retval = 0;
+
+  if (flag_vect_cost_model)
+    {
+      tree vectype = stmt_vectype (stmt_info);
+      int stmt_cost = spu_builtin_vectorization_cost (kind, vectype, misalign);
+
+      /* Statements in an inner loop relative to the loop being
+	 vectorized are weighted more heavily.  The value here is
+	 arbitrary and could potentially be improved with analysis.  */
+      if (stmt_in_inner_loop_p (stmt_info))
+	count *= 50;  /* FIXME.  */
+
+      retval = (unsigned) (count * stmt_cost);
+      *cost += retval;
+    }
+
+  return retval;
+}
+
+/* Implement targetm.vectorize.finish_cost.  */
+
+unsigned
+spu_finish_cost (void *data)
+{
+  return *((unsigned *) data);
+}
+
+/* Implement targetm.vectorize.destroy_cost_data.  */
+
+void
+spu_destroy_cost_data (void *data)
+{
+  free (data);
+}
+
 /* Return true iff, data reference of TYPE can reach vector alignment (16)
    after applying N number of iterations.  This routine does not determine
    how may iterations are required to reach desired alignment.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 189292)
+++ gcc/config/i386/i386.c	(working copy)
@@ -40132,6 +40132,59 @@  ix86_autovectorize_vector_sizes (void)
   return (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
 }
 
+/* Implement targetm.vectorize.init_cost.  */
+
+void *
+ix86_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
+{
+  unsigned *cost = XNEW (unsigned);
+  *cost = 0;
+  return cost;
+}
+
+/* Implement targetm.vectorize.add_stmt_cost.  */
+
+unsigned
+ix86_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
+		    struct _stmt_vec_info *stmt_info, int misalign)
+{
+  unsigned *cost = (unsigned *) data;
+  unsigned retval = 0;
+
+  if (flag_vect_cost_model)
+    {
+      tree vectype = stmt_vectype (stmt_info);
+      int stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
+
+      /* Statements in an inner loop relative to the loop being
+	 vectorized are weighted more heavily.  The value here is
+	 arbitrary and could potentially be improved with analysis.  */
+      if (stmt_in_inner_loop_p (stmt_info))
+	count *= 50;  /* FIXME.  */
+
+      retval = (unsigned) (count * stmt_cost);
+      *cost += retval;
+    }
+
+  return retval;
+}
+
+/* Implement targetm.vectorize.finish_cost.  */
+
+unsigned
+ix86_finish_cost (void *data)
+{
+  return *((unsigned *) data);
+}
+
+/* Implement targetm.vectorize.destroy_cost_data.  */
+
+void
+ix86_destroy_cost_data (void *data)
+{
+  free (data);
+}
+
 /* Validate target specific memory model bits in VAL. */
 
 static unsigned HOST_WIDE_INT
@@ -40442,6 +40495,14 @@  ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_INIT_COST
+#define TARGET_VECTORIZE_INIT_COST ix86_init_cost
+#undef TARGET_VECTORIZE_ADD_STMT_COST
+#define TARGET_VECTORIZE_ADD_STMT_COST ix86_add_stmt_cost
+#undef TARGET_VECTORIZE_FINISH_COST
+#define TARGET_VECTORIZE_FINISH_COST ix86_finish_cost
+#undef TARGET_VECTORIZE_DESTROY_COST_DATA
+#define TARGET_VECTORIZE_DESTROY_COST_DATA ix86_destroy_cost_data
 
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 189292)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1288,6 +1288,14 @@  static const struct attribute_spec rs6000_attribut
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
   rs6000_preferred_simd_mode
+#undef TARGET_VECTORIZE_INIT_COST
+#define TARGET_VECTORIZE_INIT_COST rs6000_init_cost
+#undef TARGET_VECTORIZE_ADD_STMT_COST
+#define TARGET_VECTORIZE_ADD_STMT_COST rs6000_add_stmt_cost
+#undef TARGET_VECTORIZE_FINISH_COST
+#define TARGET_VECTORIZE_FINISH_COST rs6000_finish_cost
+#undef TARGET_VECTORIZE_DESTROY_COST_DATA
+#define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
 
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
@@ -3563,6 +3571,59 @@  rs6000_preferred_simd_mode (enum machine_mode mode
   return word_mode;
 }
 
+/* Implement targetm.vectorize.init_cost.  */
+
+void *
+rs6000_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
+{
+  unsigned *cost = XNEW (unsigned);
+  *cost = 0;
+  return cost;
+}
+
+/* Implement targetm.vectorize.add_stmt_cost.  */
+
+unsigned
+rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
+		      struct _stmt_vec_info *stmt_info, int misalign)
+{
+  unsigned *cost = (unsigned *) data;
+  unsigned retval = 0;
+
+  if (flag_vect_cost_model)
+    {
+      tree vectype = stmt_vectype (stmt_info);
+      int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype,
+							 misalign);
+      /* Statements in an inner loop relative to the loop being
+	 vectorized are weighted more heavily.  The value here is
+	 arbitrary and could potentially be improved with analysis.  */
+      if (stmt_in_inner_loop_p (stmt_info))
+	count *= 50;  /* FIXME.  */
+
+      retval = (unsigned) (count * stmt_cost);
+      *cost += retval;
+    }
+
+  return retval;
+}
+
+/* Implement targetm.vectorize.finish_cost.  */
+
+unsigned
+rs6000_finish_cost (void *data)
+{
+  return *((unsigned *) data);
+}
+
+/* Implement targetm.vectorize.destroy_cost_data.  */
+
+void
+rs6000_destroy_cost_data (void *data)
+{
+  free (data);
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
    library with vectorized intrinsics.  */
 
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	(revision 189292)
+++ gcc/tree-vect-slp.c	(working copy)
@@ -94,6 +94,7 @@  vect_free_slp_instance (slp_instance instance)
   vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
   VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (instance));
   VEC_free (slp_tree, heap, SLP_INSTANCE_LOADS (instance));
+  VEC_free (stmt_info_for_cost, heap, SLP_INSTANCE_STMT_COST_VEC (instance));
 }
 
 
@@ -122,7 +123,6 @@  vect_create_new_slp_node (VEC (gimple, heap) *scal
   SLP_TREE_VEC_STMTS (node) = NULL;
   SLP_TREE_CHILDREN (node) = VEC_alloc (slp_void_p, heap, nops);
   SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0;
-  SLP_TREE_INSIDE_OF_LOOP_COST (node) = 0;
 
   return node;
 }
@@ -179,7 +179,8 @@  static bool
 vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
                              slp_tree slp_node, gimple stmt,
 			     int ncopies_for_cost, bool first,
-                             VEC (slp_oprnd_info, heap) **oprnds_info)
+                             VEC (slp_oprnd_info, heap) **oprnds_info,
+			     stmt_vector_for_cost *stmt_cost_vec)
 {
   tree oprnd;
   unsigned int i, number_of_oprnds;
@@ -320,7 +321,7 @@  vect_get_and_check_slp_defs (loop_vec_info loop_vi
 	      if (REFERENCE_CLASS_P (lhs))
 		/* Store.  */
                 vect_model_store_cost (stmt_info, ncopies_for_cost, false,
-                                        dt, slp_node);
+				       dt, slp_node, stmt_cost_vec);
 	      else
 		{
 		  enum vect_def_type dts[2];
@@ -329,7 +330,7 @@  vect_get_and_check_slp_defs (loop_vec_info loop_vi
 		  /* Not memory operation (we don't call this function for
 		     loads).  */
 		  vect_model_simple_cost (stmt_info, ncopies_for_cost, dts,
-					  slp_node);
+					  slp_node, stmt_cost_vec);
 		}
 	    }
 	}
@@ -446,12 +447,12 @@  vect_get_and_check_slp_defs (loop_vec_info loop_vi
 
 static bool
 vect_build_slp_tree (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
-                     slp_tree *node, unsigned int group_size,
-                     int *inside_cost, int *outside_cost,
+                     slp_tree *node, unsigned int group_size, int *outside_cost,
                      int ncopies_for_cost, unsigned int *max_nunits,
                      VEC (int, heap) **load_permutation,
                      VEC (slp_tree, heap) **loads,
-                     unsigned int vectorization_factor, bool *loads_permuted)
+                     unsigned int vectorization_factor, bool *loads_permuted,
+		     stmt_vector_for_cost *stmt_cost_vec)
 {
   unsigned int i;
   VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (*node);
@@ -470,7 +471,7 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
   HOST_WIDE_INT dummy;
   bool permutation = false;
   unsigned int load_place;
-  gimple first_load, prev_first_load = NULL;
+  gimple first_load = NULL, prev_first_load = NULL, old_first_load = NULL;
   VEC (slp_oprnd_info, heap) *oprnds_info;
   unsigned int nops;
   slp_oprnd_info oprnd_info;
@@ -711,7 +712,8 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 	      /* Store.  */
 	      if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node,
 						stmt, ncopies_for_cost,
-						(i == 0), &oprnds_info))
+						(i == 0), &oprnds_info,
+						stmt_cost_vec))
 		{
 	  	  vect_free_oprnd_info (&oprnds_info);
  		  return false;
@@ -754,6 +756,7 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
                   return false;
                 }
 
+	      old_first_load = first_load;
               first_load = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt));
               if (prev_first_load)
                 {
@@ -778,7 +781,9 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
               else
                 prev_first_load = first_load;
 
-              if (first_load == stmt)
+	      /* In some cases a group of loads is just the same load
+		 repeated N times.  Only analyze its cost once.  */
+              if (first_load == stmt && old_first_load != first_load)
                 {
                   first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
                   if (vect_supportable_dr_alignment (first_dr, false)
@@ -797,7 +802,8 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 
                   /* Analyze costs (for the first stmt in the group).  */
                   vect_model_load_cost (vinfo_for_stmt (stmt),
-                                        ncopies_for_cost, false, *node);
+                                        ncopies_for_cost, false, *node,
+					stmt_cost_vec);
                 }
 
               /* Store the place of this load in the interleaving chain.  In
@@ -871,7 +877,7 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 	  /* Find the def-stmts.  */
 	  if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node, stmt,
 					    ncopies_for_cost, (i == 0),
-					    &oprnds_info))
+					    &oprnds_info, stmt_cost_vec))
 	    {
 	      vect_free_oprnd_info (&oprnds_info);
 	      return false;
@@ -880,7 +886,6 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
     }
 
   /* Add the costs of the node to the overall instance costs.  */
-  *inside_cost += SLP_TREE_INSIDE_OF_LOOP_COST (*node);
   *outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node);
 
   /* Grouped loads were reached - stop the recursion.  */
@@ -889,11 +894,10 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
       VEC_safe_push (slp_tree, heap, *loads, *node);
       if (permutation)
         {
-
+	  gimple first_stmt = VEC_index (gimple, stmts, 0);
           *loads_permuted = true;
-          *inside_cost 
-            += targetm.vectorize.builtin_vectorization_cost (vec_perm, NULL, 0) 
-               * group_size;
+	  (void) record_stmt_cost (stmt_cost_vec, group_size, vec_perm, 
+				   vinfo_for_stmt (first_stmt), 0);
         }
       else
         {
@@ -919,9 +923,10 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
       child = vect_create_new_slp_node (oprnd_info->def_stmts);
       if (!child
           || !vect_build_slp_tree (loop_vinfo, bb_vinfo, &child, group_size,
-				inside_cost, outside_cost, ncopies_for_cost,
-				max_nunits, load_permutation, loads,
-				vectorization_factor, loads_permuted))
+				   outside_cost, ncopies_for_cost,
+				   max_nunits, load_permutation, loads,
+				   vectorization_factor, loads_permuted,
+				   stmt_cost_vec))
         {
 	  if (child)
 	    oprnd_info->def_stmts = NULL;
@@ -1459,13 +1464,14 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
   tree vectype, scalar_type = NULL_TREE;
   gimple next;
   unsigned int vectorization_factor = 0;
-  int inside_cost = 0, outside_cost = 0, ncopies_for_cost, i;
+  int outside_cost = 0, ncopies_for_cost, i;
   unsigned int max_nunits = 0;
   VEC (int, heap) *load_permutation;
   VEC (slp_tree, heap) *loads;
   struct data_reference *dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
   bool loads_permuted = false;
   VEC (gimple, heap) *scalar_stmts;
+  stmt_vector_for_cost stmt_cost_vec;
 
   if (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)))
     {
@@ -1551,12 +1557,14 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
 
   load_permutation = VEC_alloc (int, heap, group_size * group_size);
   loads = VEC_alloc (slp_tree, heap, group_size);
+  stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
 
   /* Build the tree for the SLP instance.  */
   if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &node, group_size,
-                           &inside_cost, &outside_cost, ncopies_for_cost,
+                           &outside_cost, ncopies_for_cost,
 			   &max_nunits, &load_permutation, &loads,
-			   vectorization_factor, &loads_permuted))
+			   vectorization_factor, &loads_permuted,
+			   &stmt_cost_vec))
     {
       /* Calculate the unrolling factor based on the smallest type.  */
       if (max_nunits > nunits)
@@ -1568,6 +1576,7 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
           if (vect_print_dump_info (REPORT_SLP))
             fprintf (vect_dump, "Build SLP failed: unrolling required in basic"
                                " block SLP");
+	  VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
           return false;
         }
 
@@ -1577,7 +1586,7 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
       SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
       SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
       SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost;
-      SLP_INSTANCE_INSIDE_OF_LOOP_COST (new_instance) = inside_cost;
+      SLP_INSTANCE_STMT_COST_VEC (new_instance) = stmt_cost_vec;
       SLP_INSTANCE_LOADS (new_instance) = loads;
       SLP_INSTANCE_FIRST_LOAD_STMT (new_instance) = NULL;
       SLP_INSTANCE_LOAD_PERMUTATION (new_instance) = load_permutation;
@@ -1617,6 +1626,8 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
 
       return true;
     }
+  else
+    VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
 
   /* Failed to SLP.  */
   /* Free the allocated memory.  */
@@ -1812,6 +1823,7 @@  new_bb_vec_info (basic_block bb)
 
   BB_VINFO_GROUPED_STORES (res) = VEC_alloc (gimple, heap, 10);
   BB_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 2);
+  BB_VINFO_TARGET_COST_DATA (res) = init_cost (NULL);
 
   bb->aux = res;
   return res;
@@ -1846,6 +1858,7 @@  destroy_bb_vec_info (bb_vec_info bb_vinfo)
   free_dependence_relations (BB_VINFO_DDRS (bb_vinfo));
   VEC_free (gimple, heap, BB_VINFO_GROUPED_STORES (bb_vinfo));
   VEC_free (slp_instance, heap, BB_VINFO_SLP_INSTANCES (bb_vinfo));
+  destroy_cost_data (BB_VINFO_TARGET_COST_DATA (bb_vinfo));
   free (bb_vinfo);
   bb->aux = NULL;
 }
@@ -1918,8 +1931,8 @@  vect_bb_vectorization_profitable_p (bb_vec_info bb
 {
   VEC (slp_instance, heap) *slp_instances = BB_VINFO_SLP_INSTANCES (bb_vinfo);
   slp_instance instance;
-  int i;
-  unsigned int vec_outside_cost = 0, vec_inside_cost = 0, scalar_cost = 0;
+  int i, j;
+  unsigned int vec_inside_cost = 0, vec_outside_cost = 0, scalar_cost = 0;
   unsigned int stmt_cost;
   gimple stmt;
   gimple_stmt_iterator si;
@@ -1927,12 +1940,19 @@  vect_bb_vectorization_profitable_p (bb_vec_info bb
   stmt_vec_info stmt_info = NULL;
   tree dummy_type = NULL;
   int dummy = 0;
+  stmt_vector_for_cost stmt_cost_vec;
+  stmt_info_for_cost *ci;
 
   /* Calculate vector costs.  */
   FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
     {
       vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
-      vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
+      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
+
+      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, ci)
+	(void) add_stmt_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo),
+			      ci->count, ci->kind,
+			      vinfo_for_stmt (ci->stmt), ci->misalign);
     }
 
   /* Calculate scalar cost.  */
@@ -1961,6 +1981,9 @@  vect_bb_vectorization_profitable_p (bb_vec_info bb
       scalar_cost += stmt_cost;
     }
 
+  /* Complete the target-specific cost calculation.  */
+  vec_inside_cost = finish_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo));
+
   if (vect_print_dump_info (REPORT_COST))
     {
       fprintf (vect_dump, "Cost model analysis: \n");
@@ -2072,7 +2095,7 @@  vect_slp_analyze_bb_1 (basic_block bb)
       vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance));
     }
 
-   if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
+  if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
     {
       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
         fprintf (vect_dump, "not vectorized: unsupported alignment in basic "
@@ -2175,17 +2198,30 @@  vect_slp_analyze_bb (basic_block bb)
 void
 vect_update_slp_costs_according_to_vf (loop_vec_info loop_vinfo)
 {
-  unsigned int i, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  unsigned int i, j, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
   slp_instance instance;
+  stmt_vector_for_cost stmt_cost_vec;
+  stmt_info_for_cost *si;
 
   if (vect_print_dump_info (REPORT_SLP))
     fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ===");
 
   FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
-    /* We assume that costs are linear in ncopies.  */
-    SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf
-      / SLP_INSTANCE_UNROLLING_FACTOR (instance);
+    {
+      /* We assume that costs are linear in ncopies.  */
+      int ncopies = vf / SLP_INSTANCE_UNROLLING_FACTOR (instance);
+
+      /* Record the instance's instructions in the target cost model.
+	 This was delayed until here because the count of instructions
+	 isn't known beforehand.  */
+      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
+
+      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, si)
+	(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
+			      si->count * ncopies, si->kind,
+			      vinfo_for_stmt (si->stmt), si->misalign);
+    }
 }