Patchwork Vectorizer cost model outside-cost changes

login
register
mail settings
Submitter William J. Schmidt
Date July 24, 2012, 2:19 a.m.
Message ID <1343096379.4166.13.camel@oc2474580526.ibm.com>
Download mbox | patch
Permalink /patch/172772/
State New
Headers show

Comments

William J. Schmidt - July 24, 2012, 2:19 a.m.
This patch completes the conversion of the vectorizer cost model to use
target hooks for recording vectorization information and calculating
costs.  Previous work handled the costs inside the loop body or basic
block being vectorized.  This patch similarly converts the prologue and
epilogue costs.

As before, I first verified that the new model provides the same results
as the old model on the regression testsuite and on SPEC CPU2006.  I
then removed the old model, rather than submitting an intermediate patch
with both present.  I have a patch that shows both if it's needed for
reference.

Also as before, I found an error in the old cost model wherein prologue
costs of phi reduction statements were not being considered during the
final vectorization decision.  I have fixed this in the new model; thus,
this version of the cost model will be slightly more conservative than
the original.  I am currently running SPEC tests to ensure there aren't
any resulting degradations.

One thing that could be done in future for further cleanup would be to
handle the scalar iteration cost in a similar manner.  Right now this is
dealt with by recording N scalar_stmts, where N is the length of the
scalar iteration; as with the old model, there is no attempt to
differentiate between different scalar statements.  This results in some
hackish stuff in, e.g., tree-vect-stmts.c:record_stmt_cost (), where we
have to deal with the fact that we may not have a stmt_info for the
statement being recorded.  This is only true for these aggregated
scalar_stmt costs.

Bootstrapped and tested on powerpc-unknown-linux-gnu with no new
regressions.  Assuming the SPEC performance tests come out ok, is this
ok for trunk?

Thanks!
Bill


2012-07-23  Bill Schmidt  <wschmidt@linux.ibm.com>

	* doc/tm.texi: Regenerate.
	* targhooks.c (default_init_cost): Add prologue and epilogue costs.
	(default_add_stmt_cost): Likewise; also handle NULL stmt_info.
	(default_finish_cost): Add prologue and epilogue costs.
	* targhooks.h (default_add_stmt_cost): Change parameter list.
	(default_finish_cost): Likewise.
	* target.def (init_cost): Change documentation string.
	(add_stmt_cost): Change documentation string and parameter list.
	(finish_cost): Likewise.
	* target.h (vect_cost_model_location): New enum.
	* tree-vectorizer.h (struct _slp_tree): Remove cost substruct.
	(struct _slp_instance): Remove cost substruct; rename stmt_cost_vec
	to body_cost_vec.
	(SLP_INSTANCE_OUTSIDE_OF_LOOP_COST): Remove.
	(SLP_INSTANCE_STMT_COST_VEC): Rename to SLP_INSTANCE_BODY_COST_VEC.
	(SLP_TREE_OUTSIDE_OF_LOOP_COST): Remove.
	(struct _vect_peel_extended_info): Rename stmt_cost_vec to
	body_cost_vec.
	(struct _stmt_vec_info): Remove cost substruct.
	(STMT_VINFO_OUTSIDE_OF_LOOP_COST): Remove.
	(stmt_vinfo_set_outside_of_loop_cost): Remove.
	(builtin_vectorization_cost): New function.
	(vect_get_stmt_cost): Change to use builtin_vectorization_cost.
	(add_stmt_cost): Change parameter list.
	(finish_cost): Likewise.
	(vect_model_simple_cost): Likewise.
	(vect_model_store_cost): Likewise.
	(vect_model_load_cost): Likewise.
	(record_stmt_cost): Likewise.
	(vect_get_load_cost): Likewise.
	(vect_get_known_peeling_cost): Likewise.
	* tree-vect-loop.c (vect_get_known_peeling_cost): Change parameter
	list; call record_stmt_cost for prologue and epilogue costs.
	(vect_estimate_min_profitable_iters): Call add_stmt_cost for
	prologue and epilogue costs; remove computation of vec_outside_cost;
	return vec_prologue_cost and vec_epilogue_cost from finish_cost.
	(vect_model_reduction_cost): Revise call to add_stmt_cost for body
	costs; call add_stmt_cost for prologue and epilogue costs.
	(vect_model_induction_cost): Revise call to add_stmt_cost for body
	costs; call add_stmt_cost for prologue costs.
	* tree-vect-data-refs.c (vect_get_data_access_cost): Change parameter
	list for function and arguments for calls to vect_get_load_cost and
	vect_get_store_cost.
	(vect_peeling_hash_get_lowest_cost): Change argument list for calls to
	vect_get_data_access_cost and vect_get_known_peeling_cost; use
	temporary vectors prologue_cost_vec and epilogue_cost_vec for the
	latter call and discard their results; rename stmt_cost_vec to
	body_cost_vec; correct possible storage leak for body_cost_vec.
	(vect_peeling_hash_choose_best_peeling): Rename stmt_cost_vec to
	body_cost_vec.
	(vect_enhance_data_refs_alignment): Rename stmt_cost_vec to
	body_cost_vec; add extra dummy parameter on calls to
	vect_get_data_access_cost; tolerate null si->stmt; add vect_body to
	argument list on call to add_stmt_cost.
	* tree-vect-stmts.c (record_stmt_cost): Change parameter list;
	rename stmt_cost_vec to body_cost_vec; tolerate null stmt_info; call
	builtin_vectorization_cost; add "where" parameter on call to
	add_stmt_cost.
	(vect_model_simple_cost): Change parameter list; call record_stmt_cost
	for prologue costs; remove call to stmt_vinfo_set_outside_of_loop_cost;
	rename stmt_cost_vec to body_cost_vec.
	(vect_model_promotion_demotion_cost): Add vect_body argument to call
	to add_stmt_cost; call add_stmt_cost for prologue costs; remove call
	to stmt_vinfo_set_outside_of_loop_cost.
	(vect_model_store_cost): Change parameter list; call record_stmt_cost
	for prologue costs; add vect_body argument to call to record_stmt_cost;
	rename stmt_cost_vec to body_cost_vec; remove call to
	stmt_vinfo_set_outside_of_loop_cost.
	(vect_get_store_cost): Rename stmt_cost_vec to body_cost_vec; add
	vect_body argument to calls to record_stmt_cost.
	(vect_model_load_cost): Change parameter list; rename stmt_cost_vec to
	body_cost_vec; add vect_body argument to calls to record_stmt_cost;
	remove call to stmt_vinfo_set_outside_of_loop_cost.
	(vect_get_load_cost): Change parameter list; rename stmt_cost_vec to
	body_cost_vec; add vect_body argument to calls to record_stmt_cost;
	call record_stmt_cost for prologue costs.
	(vectorizable_store): Change argument list for call to
	vect_model_store_cost.
	(vectorizable_load): Change argument list for call to
	vect_model_load_cost.
	(new_stmt_vec_info): Remove assignment to
	STMT_VINFO_OUTSIDE_OF_LOOP_COST.
	* config/spu/spu.c (spu_init_cost): Add prologue and epilogue costs.
	(spu_add_stmt_cost): Likewise; also handle NULL stmt_info.
	(spu_finish_cost): Add prologue and epilogue costs.
	* config/i386/i386.c (i386_init_cost): Add prologue and epilogue costs.
	(i386_add_stmt_cost): Likewise; also handle NULL stmt_info.
	(i386_finish_cost): Add prologue and epilogue costs.
	* config/rs6000/rs6000.c (rs6000_init_cost): Add prologue and epilogue
	costs.
	(rs6000_add_stmt_cost): Likewise; also handle NULL stmt_info.
	(rs6000_finish_cost): Add prologue and epilogue costs.
	* tree-vect-slp.c (vect_free_slp_instance): Rename
	SLP_INSTANCE_STMT_COST_VEC to SLP_INSTANCE_BODY_COST_VEC.
	(vect_create_new_slp_node): Remove assignment to
	SLP_TREE_OUTSIDE_OF_LOOP_COST.
	(vect_get_and_check_slp_defs): Change parameter list; change argument
	lists to calls to vect_model_store_cost and vect_model_simple_cost.
	(vect_build_slp_tree): Change parameter list; change argument lists
	to calls to vect_model_load_cost, vect_get_and_check_slp_defs, and
	recursive self-calls; remove setting of outside_cost from
	SLP_TREE_OUTSIDE_OF_LOOP_COST; add vect_body argument to call to
	record_stmt_cost.
	(vect_analyze_slp_instance): Rename stmt_cost_vec to body_cost_vec;
	rename SLP_INSTANCE_STMT_COST_VEC to SLP_INSTANCE_BODY_COST_VEC;
	remove assignment to SLP_INSTANCE_OUTSIDE_OF_LOOP_COST; record SLP
	prologue costs.
	(vect_bb_vectorization_profitable_p): Rename stmt_cost_vec to
	body_cost_vec; handle null ci->stmt; add vect_body argument to call
	to add_stmt_cost; simplify calls to targetm.vectorize.
	builtin_vectorization_cost; return vec_prologue_cost and
	vec_epilogue_cost from finish_cost.
	(vect_update_slp_costs_according_to_vf): Rename stmt_cost_vec to
	body_cost_vec; add vect_body argument to call to add_stmt_cost.
Richard Guenther - July 24, 2012, 8:57 a.m.
On Mon, 23 Jul 2012, William J. Schmidt wrote:

> This patch completes the conversion of the vectorizer cost model to use
> target hooks for recording vectorization information and calculating
> costs.  Previous work handled the costs inside the loop body or basic
> block being vectorized.  This patch similarly converts the prologue and
> epilogue costs.
> 
> As before, I first verified that the new model provides the same results
> as the old model on the regression testsuite and on SPEC CPU2006.  I
> then removed the old model, rather than submitting an intermediate patch
> with both present.  I have a patch that shows both if it's needed for
> reference.
> 
> Also as before, I found an error in the old cost model wherein prologue
> costs of phi reduction statements were not being considered during the
> final vectorization decision.  I have fixed this in the new model; thus,
> this version of the cost model will be slightly more conservative than
> the original.  I am currently running SPEC tests to ensure there aren't
> any resulting degradations.
> 
> One thing that could be done in future for further cleanup would be to
> handle the scalar iteration cost in a similar manner.  Right now this is
> dealt with by recording N scalar_stmts, where N is the length of the
> scalar iteration; as with the old model, there is no attempt to
> differentiate between different scalar statements.  This results in some
> hackish stuff in, e.g., tree-vect-stmts.c:record_stmt_cost (), where we
> have to deal with the fact that we may not have a stmt_info for the
> statement being recorded.  This is only true for these aggregated
> scalar_stmt costs.
> 
> Bootstrapped and tested on powerpc-unknown-linux-gnu with no new
> regressions.  Assuming the SPEC performance tests come out ok, is this
> ok for trunk?

So all costs we query from the backend even for the prologue/epilogue
are costs for vector stmts (like inits of invariant vectors or
outer-loop parts in outer loop vectorization)?

Ok in that case.

Thanks,
Richard.

> Thanks!
> Bill
> 
> 
> 2012-07-23  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> 	* doc/tm.texi: Regenerate.
> 	* targhooks.c (default_init_cost): Add prologue and epilogue costs.
> 	(default_add_stmt_cost): Likewise; also handle NULL stmt_info.
> 	(default_finish_cost): Add prologue and epilogue costs.
> 	* targhooks.h (default_add_stmt_cost): Change parameter list.
> 	(default_finish_cost): Likewise.
> 	* target.def (init_cost): Change documentation string.
> 	(add_stmt_cost): Change documentation string and parameter list.
> 	(finish_cost): Likewise.
> 	* target.h (vect_cost_model_location): New enum.
> 	* tree-vectorizer.h (struct _slp_tree): Remove cost substruct.
> 	(struct _slp_instance): Remove cost substruct; rename stmt_cost_vec
> 	to body_cost_vec.
> 	(SLP_INSTANCE_OUTSIDE_OF_LOOP_COST): Remove.
> 	(SLP_INSTANCE_STMT_COST_VEC): Rename to SLP_INSTANCE_BODY_COST_VEC.
> 	(SLP_TREE_OUTSIDE_OF_LOOP_COST): Remove.
> 	(struct _vect_peel_extended_info): Rename stmt_cost_vec to
> 	body_cost_vec.
> 	(struct _stmt_vec_info): Remove cost substruct.
> 	(STMT_VINFO_OUTSIDE_OF_LOOP_COST): Remove.
> 	(stmt_vinfo_set_outside_of_loop_cost): Remove.
> 	(builtin_vectorization_cost): New function.
> 	(vect_get_stmt_cost): Change to use builtin_vectorization_cost.
> 	(add_stmt_cost): Change parameter list.
> 	(finish_cost): Likewise.
> 	(vect_model_simple_cost): Likewise.
> 	(vect_model_store_cost): Likewise.
> 	(vect_model_load_cost): Likewise.
> 	(record_stmt_cost): Likewise.
> 	(vect_get_load_cost): Likewise.
> 	(vect_get_known_peeling_cost): Likewise.
> 	* tree-vect-loop.c (vect_get_known_peeling_cost): Change parameter
> 	list; call record_stmt_cost for prologue and epilogue costs.
> 	(vect_estimate_min_profitable_iters): Call add_stmt_cost for
> 	prologue and epilogue costs; remove computation of vec_outside_cost;
> 	return vec_prologue_cost and vec_epilogue_cost from finish_cost.
> 	(vect_model_reduction_cost): Revise call to add_stmt_cost for body
> 	costs; call add_stmt_cost for prologue and epilogue costs.
> 	(vect_model_induction_cost): Revise call to add_stmt_cost for body
> 	costs; call add_stmt_cost for prologue costs.
> 	* tree-vect-data-refs.c (vect_get_data_access_cost): Change parameter
> 	list for function and arguments for calls to vect_get_load_cost and
> 	vect_get_store_cost.
> 	(vect_peeling_hash_get_lowest_cost): Change argument list for calls to
> 	vect_get_data_access_cost and vect_get_known_peeling_cost; use
> 	temporary vectors prologue_cost_vec and epilogue_cost_vec for the
> 	latter call and discard their results; rename stmt_cost_vec to
> 	body_cost_vec; correct possible storage leak for body_cost_vec.
> 	(vect_peeling_hash_choose_best_peeling): Rename stmt_cost_vec to
> 	body_cost_vec.
> 	(vect_enhance_data_refs_alignment): Rename stmt_cost_vec to
> 	body_cost_vec; add extra dummy parameter on calls to
> 	vect_get_data_access_cost; tolerate null si->stmt; add vect_body to
> 	argument list on call to add_stmt_cost.
> 	* tree-vect-stmts.c (record_stmt_cost): Change parameter list;
> 	rename stmt_cost_vec to body_cost_vec; tolerate null stmt_info; call
> 	builtin_vectorization_cost; add "where" parameter on call to
> 	add_stmt_cost.
> 	(vect_model_simple_cost): Change parameter list; call record_stmt_cost
> 	for prologue costs; remove call to stmt_vinfo_set_outside_of_loop_cost;
> 	rename stmt_cost_vec to body_cost_vec.
> 	(vect_model_promotion_demotion_cost): Add vect_body argument to call
> 	to add_stmt_cost; call add_stmt_cost for prologue costs; remove call
> 	to stmt_vinfo_set_outside_of_loop_cost.
> 	(vect_model_store_cost): Change parameter list; call record_stmt_cost
> 	for prologue costs; add vect_body argument to call to record_stmt_cost;
> 	rename stmt_cost_vec to body_cost_vec; remove call to
> 	stmt_vinfo_set_outside_of_loop_cost.
> 	(vect_get_store_cost): Rename stmt_cost_vec to body_cost_vec; add
> 	vect_body argument to calls to record_stmt_cost.
> 	(vect_model_load_cost): Change parameter list; rename stmt_cost_vec to
> 	body_cost_vec; add vect_body argument to calls to record_stmt_cost;
> 	remove call to stmt_vinfo_set_outside_of_loop_cost.
> 	(vect_get_load_cost): Change parameter list; rename stmt_cost_vec to
> 	body_cost_vec; add vect_body argument to calls to record_stmt_cost;
> 	call record_stmt_cost for prologue costs.
> 	(vectorizable_store): Change argument list for call to
> 	vect_model_store_cost.
> 	(vectorizable_load): Change argument list for call to
> 	vect_model_load_cost.
> 	(new_stmt_vec_info): Remove assignment to
> 	STMT_VINFO_OUTSIDE_OF_LOOP_COST.
> 	* config/spu/spu.c (spu_init_cost): Add prologue and epilogue costs.
> 	(spu_add_stmt_cost): Likewise; also handle NULL stmt_info.
> 	(spu_finish_cost): Add prologue and epilogue costs.
> 	* config/i386/i386.c (i386_init_cost): Add prologue and epilogue costs.
> 	(i386_add_stmt_cost): Likewise; also handle NULL stmt_info.
> 	(i386_finish_cost): Add prologue and epilogue costs.
> 	* config/rs6000/rs6000.c (rs6000_init_cost): Add prologue and epilogue
> 	costs.
> 	(rs6000_add_stmt_cost): Likewise; also handle NULL stmt_info.
> 	(rs6000_finish_cost): Add prologue and epilogue costs.
> 	* tree-vect-slp.c (vect_free_slp_instance): Rename
> 	SLP_INSTANCE_STMT_COST_VEC to SLP_INSTANCE_BODY_COST_VEC.
> 	(vect_create_new_slp_node): Remove assignment to
> 	SLP_TREE_OUTSIDE_OF_LOOP_COST.
> 	(vect_get_and_check_slp_defs): Change parameter list; change argument
> 	lists to calls to vect_model_store_cost and vect_model_simple_cost.
> 	(vect_build_slp_tree): Change parameter list; change argument lists
> 	to calls to vect_model_load_cost, vect_get_and_check_slp_defs, and
> 	recursive self-calls; remove setting of outside_cost from
> 	SLP_TREE_OUTSIDE_OF_LOOP_COST; add vect_body argument to call to
> 	record_stmt_cost.
> 	(vect_analyze_slp_instance): Rename stmt_cost_vec to body_cost_vec;
> 	rename SLP_INSTANCE_STMT_COST_VEC to SLP_INSTANCE_BODY_COST_VEC;
> 	remove assignment to SLP_INSTANCE_OUTSIDE_OF_LOOP_COST; record SLP
> 	prologue costs.
> 	(vect_bb_vectorization_profitable_p): Rename stmt_cost_vec to
> 	body_cost_vec; handle null ci->stmt; add vect_body argument to call
> 	to add_stmt_cost; simplify calls to targetm.vectorize.
> 	builtin_vectorization_cost; return vec_prologue_cost and
> 	vec_epilogue_cost from finish_cost.
> 	(vect_update_slp_costs_according_to_vf): Rename stmt_cost_vec to
> 	body_cost_vec; add vect_body argument to call to add_stmt_cost.
> 
> 
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi	(revision 189574)
> +++ gcc/doc/tm.texi	(working copy)
> @@ -5771,15 +5771,15 @@ The default is zero which means to not iterate ove
>  @end deftypefn
>  
>  @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info})
> -This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates an unsigned integer for accumulating a single cost.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
> +This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
>  @end deftypefn
>  
> -@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (void *@var{data}, int @var{count}, enum vect_cost_for_stmt @var{kind}, struct _stmt_vec_info *@var{stmt_info}, int @var{misalign})
> -This hook should update the target-specific @var{data} in response to adding @var{count} copies of the given @var{kind} of statement to the body of a loop or basic block.  The default adds the builtin vectorizer cost for the copies of the statement to the accumulator, and returns the amount added.  The return value should be viewed as a tentative cost that may later be overridden.
> +@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (void *@var{data}, int @var{count}, enum vect_cost_for_stmt @var{kind}, struct _stmt_vec_info *@var{stmt_info}, int @var{misalign}, enum vect_cost_model_location @var{where})
> +This hook should update the target-specific @var{data} in response to adding @var{count} copies of the given @var{kind} of statement to a loop or basic block.  The default adds the builtin vectorizer cost for the copies of the statement to the accumulator specified by @var{where}, (the prologue, body, or epilogue) and returns the amount added.  The return value should be viewed as a tentative cost that may later be revised.
>  @end deftypefn
>  
> -@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_FINISH_COST (void *@var{data})
> -This hook should complete calculations of the cost of vectorizing a loop or basic block based on @var{data}, and return that cost as an unsigned integer.  The default returns the value of the accumulator.
> +@deftypefn {Target Hook} void TARGET_VECTORIZE_FINISH_COST (void *@var{data}, unsigned *@var{prologue_cost}, unsigned *@var{body_cost}, unsigned *@var{epilogue_cost})
> +This hook should complete calculations of the cost of vectorizing a loop or basic block based on @var{data}, and return the prologue, body, and epilogue costs as unsigned integers.  The default returns the value of the three accumulators.
>  @end deftypefn
>  
>  @deftypefn {Target Hook} void TARGET_VECTORIZE_DESTROY_COST_DATA (void *@var{data})
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c	(revision 189574)
> +++ gcc/targhooks.c	(working copy)
> @@ -996,54 +996,58 @@ default_autovectorize_vector_sizes (void)
>    return 0;
>  }
>  
> -/* By default, the cost model just accumulates the inside_loop costs for
> -   a vectorized loop or block.  So allocate an unsigned int, set it to
> -   zero, and return its address.  */
> +/* By default, the cost model accumulates three separate costs (prologue,
> +   loop body, and epilogue) for a vectorized loop or block.  So allocate an
> +   array of three unsigned ints, set it to zero, and return its address.  */
>  
>  void *
>  default_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
>  {
> -  unsigned *cost = XNEW (unsigned);
> -  *cost = 0;
> +  unsigned *cost = XNEWVEC (unsigned, 3);
> +  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
>    return cost;
>  }
>  
>  /* By default, the cost model looks up the cost of the given statement
>     kind and mode, multiplies it by the occurrence count, accumulates
> -   it into the cost, and returns the cost added.  */
> +   it into the cost specified by WHERE, and returns the cost added.  */
>  
>  unsigned
>  default_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> -		       struct _stmt_vec_info *stmt_info, int misalign)
> +		       struct _stmt_vec_info *stmt_info, int misalign,
> +		       enum vect_cost_model_location where)
>  {
>    unsigned *cost = (unsigned *) data;
>    unsigned retval = 0;
>  
>    if (flag_vect_cost_model)
>      {
> -      tree vectype = stmt_vectype (stmt_info);
> +      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>        int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
>  							  misalign);
>        /* Statements in an inner loop relative to the loop being
>  	 vectorized are weighted more heavily.  The value here is
>  	 arbitrary and could potentially be improved with analysis.  */
> -      if (stmt_in_inner_loop_p (stmt_info))
> +      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
>  	count *= 50;  /* FIXME.  */
>  
>        retval = (unsigned) (count * stmt_cost);
> -      *cost += retval;
> +      cost[where] += retval;
>      }
>  
>    return retval;
>  }
>  
> -/* By default, the cost model just returns the accumulated
> -   inside_loop cost.  */
> +/* By default, the cost model just returns the accumulated costs.  */
>  
> -unsigned
> -default_finish_cost (void *data)
> +void
> +default_finish_cost (void *data, unsigned *prologue_cost,
> +		     unsigned *body_cost, unsigned *epilogue_cost)
>  {
> -  return *((unsigned *) data);
> +  unsigned *cost = (unsigned *) data;
> +  *prologue_cost = cost[vect_prologue];
> +  *body_cost     = cost[vect_body];
> +  *epilogue_cost = cost[vect_epilogue];
>  }
>  
>  /* Free the cost data.  */
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h	(revision 189574)
> +++ gcc/targhooks.h	(working copy)
> @@ -92,8 +92,9 @@ extern enum machine_mode default_preferred_simd_mo
>  extern unsigned int default_autovectorize_vector_sizes (void);
>  extern void *default_init_cost (struct loop *);
>  extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
> -				       struct _stmt_vec_info *, int);
> -extern unsigned default_finish_cost (void *);
> +				       struct _stmt_vec_info *, int,
> +				       enum vect_cost_model_location);
> +extern void default_finish_cost (void *, unsigned *, unsigned *, unsigned *);
>  extern void default_destroy_cost_data (void *);
>  
>  /* These are here, and not in hooks.[ch], because not all users of
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def	(revision 189574)
> +++ gcc/target.def	(working copy)
> @@ -1054,27 +1054,30 @@ DEFHOOK
>  (init_cost,
>   "This hook should initialize target-specific data structures in preparation "
>   "for modeling the costs of vectorizing a loop or basic block.  The default "
> - "allocates an unsigned integer for accumulating a single cost.  "
> - "If @var{loop_info} is non-NULL, it identifies the loop being vectorized; "
> - "otherwise a single block is being vectorized.",
> + "allocates three unsigned integers for accumulating costs for the prologue, "
> + "body, and epilogue of the loop or basic block.  If @var{loop_info} is "
> + "non-NULL, it identifies the loop being vectorized; otherwise a single block "
> + "is being vectorized.",
>   void *,
>   (struct loop *loop_info),
>   default_init_cost)
>  
>  /* Target function to record N statements of the given kind using the
> -   given vector type within the cost model data for the current loop
> -   or block.  */
> +   given vector type within the cost model data for the current loop or
> +    block.  */
>  DEFHOOK
>  (add_stmt_cost,
>   "This hook should update the target-specific @var{data} in response to "
> - "adding @var{count} copies of the given @var{kind} of statement to the "
> - "body of a loop or basic block.  The default adds the builtin vectorizer "
> - "cost for the copies of the statement to the accumulator, and returns "
> - "the amount added.  The return value should be viewed as a tentative "
> - "cost that may later be overridden.",
> + "adding @var{count} copies of the given @var{kind} of statement to a "
> + "loop or basic block.  The default adds the builtin vectorizer cost for "
> + "the copies of the statement to the accumulator specified by @var{where}, "
> + "(the prologue, body, or epilogue) and returns the amount added.  The "
> + "return value should be viewed as a tentative cost that may later be "
> + "revised.",
>   unsigned,
>   (void *data, int count, enum vect_cost_for_stmt kind,
> -  struct _stmt_vec_info *stmt_info, int misalign),
> +  struct _stmt_vec_info *stmt_info, int misalign,
> +  enum vect_cost_model_location where),
>   default_add_stmt_cost)
>  
>  /* Target function to calculate the total cost of the current vectorized
> @@ -1082,10 +1085,12 @@ DEFHOOK
>  DEFHOOK
>  (finish_cost,
>   "This hook should complete calculations of the cost of vectorizing a loop "
> - "or basic block based on @var{data}, and return that cost as an unsigned "
> - "integer.  The default returns the value of the accumulator.",
> - unsigned,
> - (void *data),
> + "or basic block based on @var{data}, and return the prologue, body, and "
> + "epilogue costs as unsigned integers.  The default returns the value of "
> + "the three accumulators.",
> + void,
> + (void *data, unsigned *prologue_cost, unsigned *body_cost,
> +  unsigned *epilogue_cost),
>   default_finish_cost)
>  
>  /* Function to delete target-specific cost modeling data.  */
> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h	(revision 189574)
> +++ gcc/target.h	(working copy)
> @@ -157,6 +157,14 @@ enum vect_cost_for_stmt
>    vec_construct
>  };
>  
> +/* Separate locations for which the vectorizer cost model should
> +   track costs.  */
> +enum vect_cost_model_location {
> +  vect_prologue = 0,
> +  vect_body = 1,
> +  vect_epilogue = 2
> +};
> +
>  /* The target structure.  This holds all the backend hooks.  */
>  #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
>  #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h	(revision 189574)
> +++ gcc/tree-vectorizer.h	(working copy)
> @@ -118,11 +118,6 @@ typedef struct _slp_tree {
>       scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
>       divided by vector size.  */
>    unsigned int vec_stmts_size;
> -  /* Vectorization costs associated with SLP node.  */
> -  struct
> -  {
> -    int outside_of_loop;     /* Statements generated outside loop.  */
> -  } cost;
>  } *slp_tree;
>  
>  DEF_VEC_P(slp_tree);
> @@ -141,14 +136,8 @@ typedef struct _slp_instance {
>    unsigned int unrolling_factor;
>  
>    /* Vectorization costs associated with SLP instance.  */
> -  struct
> -  {
> -    int outside_of_loop;     /* Statements generated outside loop.  */
> -  } cost;
> +  stmt_vector_for_cost body_cost_vec;
>  
> -  /* Inside-loop costs.  */
> -  stmt_vector_for_cost stmt_cost_vec;
> -
>    /* Loads permutation relatively to the stores, NULL if there is no
>       permutation.  */
>    VEC (int, heap) *load_permutation;
> @@ -168,8 +157,7 @@ DEF_VEC_ALLOC_P(slp_instance, heap);
>  #define SLP_INSTANCE_TREE(S)                     (S)->root
>  #define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
>  #define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
> -#define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S)     (S)->cost.outside_of_loop
> -#define SLP_INSTANCE_STMT_COST_VEC(S)            (S)->stmt_cost_vec
> +#define SLP_INSTANCE_BODY_COST_VEC(S)            (S)->body_cost_vec
>  #define SLP_INSTANCE_LOAD_PERMUTATION(S)         (S)->load_permutation
>  #define SLP_INSTANCE_LOADS(S)                    (S)->loads
>  #define SLP_INSTANCE_FIRST_LOAD_STMT(S)          (S)->first_load
> @@ -178,7 +166,6 @@ DEF_VEC_ALLOC_P(slp_instance, heap);
>  #define SLP_TREE_SCALAR_STMTS(S)                 (S)->stmts
>  #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
>  #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
> -#define SLP_TREE_OUTSIDE_OF_LOOP_COST(S)         (S)->cost.outside_of_loop
>  
>  /* This structure is used in creation of an SLP tree.  Each instance
>     corresponds to the same operand in a group of scalar stmts in an SLP
> @@ -212,7 +199,7 @@ typedef struct _vect_peel_extended_info
>    struct _vect_peel_info peel_info;
>    unsigned int inside_cost;
>    unsigned int outside_cost;
> -  stmt_vector_for_cost stmt_cost_vec;
> +  stmt_vector_for_cost body_cost_vec;
>  } *vect_peel_extended_info;
>  
>  /*-----------------------------------------------------------------*/
> @@ -566,12 +553,6 @@ typedef struct _stmt_vec_info {
>       indicates whether the stmt needs to be vectorized.  */
>    enum vect_relevant relevant;
>  
> -  /* Vectorization costs associated with statement.  */
> -  struct
> -  {
> -    int outside_of_loop;     /* Statements generated outside loop.  */
> -  } cost;
> -
>    /* The bb_vec_info with respect to which STMT is vectorized.  */
>    bb_vec_info bb_vinfo;
>  
> @@ -628,7 +609,6 @@ typedef struct _stmt_vec_info {
>  #define GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
>  
>  #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_scope)
> -#define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
>  
>  #define HYBRID_SLP_STMT(S)                ((S)->slp_type == hybrid)
>  #define PURE_SLP_STMT(S)                  ((S)->slp_type == pure_slp)
> @@ -767,18 +747,6 @@ is_loop_header_bb_p (basic_block bb)
>    return false;
>  }
>  
> -/* Set outside loop vectorization cost.  */
> -
> -static inline void
> -stmt_vinfo_set_outside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node,
> -				     int cost)
> -{
> -  if (slp_node)
> -    SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node) = cost;
> -  else
> -    STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
> -}
> -
>  /* Return pow2 (X).  */
>  
>  static inline int
> @@ -792,16 +760,22 @@ vect_pow2 (int x)
>    return res;
>  }
>  
> +/* Alias targetm.vectorize.builtin_vectorization_cost.  */
> +
> +static inline int
> +builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
> +			    tree vectype, int misalign)
> +{
> +  return targetm.vectorize.builtin_vectorization_cost (type_of_cost,
> +						       vectype, misalign);
> +}
> +
>  /* Get cost by calling cost target builtin.  */
>  
>  static inline
>  int vect_get_stmt_cost (enum vect_cost_for_stmt type_of_cost)
>  {
> -  tree dummy_type = NULL;
> -  int dummy = 0;
> -
> -  return targetm.vectorize.builtin_vectorization_cost (type_of_cost,
> -                                                       dummy_type, dummy);
> +  return builtin_vectorization_cost (type_of_cost, NULL, 0);
>  }
>  
>  /* Alias targetm.vectorize.init_cost.  */
> @@ -816,18 +790,20 @@ init_cost (struct loop *loop_info)
>  
>  static inline unsigned
>  add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> -	       stmt_vec_info stmt_info, int misalign)
> +	       stmt_vec_info stmt_info, int misalign,
> +	       enum vect_cost_model_location where)
>  {
>    return targetm.vectorize.add_stmt_cost (data, count, kind,
> -					  stmt_info, misalign);
> +					  stmt_info, misalign, where);
>  }
>  
>  /* Alias targetm.vectorize.finish_cost.  */
>  
> -static inline unsigned
> -finish_cost (void *data)
> +static inline void
> +finish_cost (void *data, unsigned *prologue_cost,
> +	     unsigned *body_cost, unsigned *epilogue_cost)
>  {
> -  return targetm.vectorize.finish_cost (data);
> +  targetm.vectorize.finish_cost (data, prologue_cost, body_cost, epilogue_cost);
>  }
>  
>  /* Alias targetm.vectorize.destroy_cost_data.  */
> @@ -905,14 +881,18 @@ extern stmt_vec_info new_stmt_vec_info (gimple stm
>  extern void free_stmt_vec_info (gimple stmt);
>  extern tree vectorizable_function (gimple, tree, tree);
>  extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
> -                                    slp_tree, stmt_vector_for_cost *);
> +                                    stmt_vector_for_cost *,
> +				    stmt_vector_for_cost *);
>  extern void vect_model_store_cost (stmt_vec_info, int, bool,
>  				   enum vect_def_type, slp_tree,
> +				   stmt_vector_for_cost *,
>  				   stmt_vector_for_cost *);
>  extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
> +				  stmt_vector_for_cost *,
>  				  stmt_vector_for_cost *);
>  extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
> -				  enum vect_cost_for_stmt, stmt_vec_info, int);
> +				  enum vect_cost_for_stmt, stmt_vec_info,
> +				  int, enum vect_cost_model_location);
>  extern void vect_finish_stmt_generation (gimple, gimple,
>                                           gimple_stmt_iterator *);
>  extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
> @@ -928,7 +908,8 @@ extern bool vectorizable_condition (gimple, gimple
>                                      tree, int, slp_tree);
>  extern void vect_get_load_cost (struct data_reference *, int, bool,
>  				unsigned int *, unsigned int *,
> -				stmt_vector_for_cost *);
> +				stmt_vector_for_cost *,
> +				stmt_vector_for_cost *, bool);
>  extern void vect_get_store_cost (struct data_reference *, int,
>  				 unsigned int *, stmt_vector_for_cost *);
>  extern bool vect_supportable_shift (enum tree_code, tree);
> @@ -992,7 +973,9 @@ extern bool vectorizable_induction (gimple, gimple
>  extern int vect_estimate_min_profitable_iters (loop_vec_info);
>  extern tree get_initial_def_for_reduction (gimple, tree, tree *);
>  extern int vect_min_worthwhile_factor (enum tree_code);
> -extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, int);
> +extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, int,
> +					stmt_vector_for_cost *,
> +					stmt_vector_for_cost *);
>  extern int vect_get_single_scalar_iteration_cost (loop_vec_info);
>  
>  /* In tree-vect-slp.c.  */
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c	(revision 189574)
> +++ gcc/tree-vect-loop.c	(working copy)
> @@ -2440,9 +2440,11 @@ vect_get_single_scalar_iteration_cost (loop_vec_in
>  int
>  vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
>                               int *peel_iters_epilogue,
> -                             int scalar_single_iter_cost)
> +                             int scalar_single_iter_cost,
> +			     stmt_vector_for_cost *prologue_cost_vec,
> +			     stmt_vector_for_cost *epilogue_cost_vec)
>  {
> -  int peel_guard_costs = 0;
> +  int retval = 0;
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>  
>    if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> @@ -2455,7 +2457,8 @@ vect_get_known_peeling_cost (loop_vec_info loop_vi
>  
>        /* If peeled iterations are known but number of scalar loop
>           iterations are unknown, count a taken branch per peeled loop.  */
> -      peel_guard_costs =  2 * vect_get_stmt_cost (cond_branch_taken);
> +      retval = record_stmt_cost (prologue_cost_vec, 2, cond_branch_taken,
> +				 NULL, 0, vect_prologue);
>      }
>    else
>      {
> @@ -2469,9 +2472,15 @@ vect_get_known_peeling_cost (loop_vec_info loop_vi
>          *peel_iters_epilogue = vf;
>      }
>  
> -   return (peel_iters_prologue * scalar_single_iter_cost)
> -            + (*peel_iters_epilogue * scalar_single_iter_cost)
> -           + peel_guard_costs;
> +  if (peel_iters_prologue)
> +    retval += record_stmt_cost (prologue_cost_vec,
> +				peel_iters_prologue * scalar_single_iter_cost,
> +				scalar_stmt, NULL, 0, vect_prologue);
> +  if (*peel_iters_epilogue)
> +    retval += record_stmt_cost (epilogue_cost_vec,
> +				*peel_iters_epilogue * scalar_single_iter_cost,
> +				scalar_stmt, NULL, 0, vect_epilogue);
> +  return retval;
>  }
>  
>  /* Function vect_estimate_min_profitable_iters
> @@ -2486,22 +2495,18 @@ vect_get_known_peeling_cost (loop_vec_info loop_vi
>  int
>  vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo)
>  {
> -  int i;
>    int min_profitable_iters;
>    int peel_iters_prologue;
>    int peel_iters_epilogue;
> -  int vec_inside_cost = 0;
> +  unsigned vec_inside_cost = 0;
>    int vec_outside_cost = 0;
> +  unsigned vec_prologue_cost = 0;
> +  unsigned vec_epilogue_cost = 0;
>    int scalar_single_iter_cost = 0;
>    int scalar_outside_cost = 0;
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> -  int nbbs = loop->num_nodes;
>    int npeel = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
> -  int peel_guard_costs = 0;
> -  VEC (slp_instance, heap) *slp_instances;
> -  slp_instance instance;
> +  void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
>  
>    /* Cost model disabled.  */
>    if (!flag_vect_cost_model)
> @@ -2515,8 +2520,10 @@ vect_estimate_min_profitable_iters (loop_vec_info
>    if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo))
>      {
>        /*  FIXME: Make cost depend on complexity of individual check.  */
> -      vec_outside_cost +=
> -	VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
> +      unsigned len = VEC_length (gimple,
> +				 LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
> +      (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0,
> +			    vect_prologue);
>        if (vect_print_dump_info (REPORT_COST))
>          fprintf (vect_dump, "cost model: Adding cost of checks for loop "
>                   "versioning to treat misalignment.\n");
> @@ -2526,8 +2533,9 @@ vect_estimate_min_profitable_iters (loop_vec_info
>    if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
>      {
>        /*  FIXME: Make cost depend on complexity of individual check.  */
> -      vec_outside_cost +=
> -        VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
> +      unsigned len = VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
> +      (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0,
> +			    vect_prologue);
>        if (vect_print_dump_info (REPORT_COST))
>          fprintf (vect_dump, "cost model: Adding cost of checks for loop "
>                   "versioning aliasing.\n");
> @@ -2535,7 +2543,8 @@ vect_estimate_min_profitable_iters (loop_vec_info
>  
>    if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo)
>        || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
> -    vec_outside_cost += vect_get_stmt_cost (cond_branch_taken); 
> +    (void) add_stmt_cost (target_cost_data, 1, cond_branch_taken, NULL, 0,
> +			  vect_prologue);
>  
>    /* Count statements in scalar loop.  Using this as scalar cost for a single
>       iteration for now.
> @@ -2545,52 +2554,6 @@ vect_estimate_min_profitable_iters (loop_vec_info
>       TODO: Consider assigning different costs to different scalar
>       statements.  */
>  
> -  for (i = 0; i < nbbs; i++)
> -    {
> -      gimple_stmt_iterator si;
> -      basic_block bb = bbs[i];
> -
> -      for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> -	{
> -	  gimple stmt = gsi_stmt (si);
> -	  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> -
> -	  if (STMT_VINFO_IN_PATTERN_P (stmt_info))
> -	    {
> -	      stmt = STMT_VINFO_RELATED_STMT (stmt_info);
> -	      stmt_info = vinfo_for_stmt (stmt);
> -	    }
> -
> -	  /* Skip stmts that are not vectorized inside the loop.  */
> -	  if (!STMT_VINFO_RELEVANT_P (stmt_info)
> -	      && (!STMT_VINFO_LIVE_P (stmt_info)
> -                 || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))))
> -	    continue;
> -
> -	  /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
> -	     some of the "outside" costs are generated inside the outer-loop.  */
> -	  vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
> -          if (is_pattern_stmt_p (stmt_info)
> -	      && STMT_VINFO_PATTERN_DEF_SEQ (stmt_info))
> -            {
> -	      gimple_stmt_iterator gsi;
> -	      
> -	      for (gsi = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> -		   !gsi_end_p (gsi); gsi_next (&gsi))
> -                {
> -                  gimple pattern_def_stmt = gsi_stmt (gsi);
> -                  stmt_vec_info pattern_def_stmt_info
> -		    = vinfo_for_stmt (pattern_def_stmt);
> -                  if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)
> -                      || STMT_VINFO_LIVE_P (pattern_def_stmt_info))
> -		    vec_outside_cost
> -		      += STMT_VINFO_OUTSIDE_OF_LOOP_COST
> -		        (pattern_def_stmt_info);
> -		}
> -	    }
> -	}
> -    }
> -
>    scalar_single_iter_cost = vect_get_single_scalar_iteration_cost (loop_vinfo);
>  
>    /* Add additional cost for the peeled instructions in prologue and epilogue
> @@ -2621,18 +2584,54 @@ vect_estimate_min_profitable_iters (loop_vec_info
>           branch per peeled loop. Even if scalar loop iterations are known,
>           vector iterations are not known since peeled prologue iterations are
>           not known. Hence guards remain the same.  */
> -      peel_guard_costs +=  2 * (vect_get_stmt_cost (cond_branch_taken)
> -                                + vect_get_stmt_cost (cond_branch_not_taken));
> -      vec_outside_cost += (peel_iters_prologue * scalar_single_iter_cost)
> -                           + (peel_iters_epilogue * scalar_single_iter_cost)
> -                           + peel_guard_costs;
> +      (void) add_stmt_cost (target_cost_data, 2, cond_branch_taken,
> +			    NULL, 0, vect_prologue);
> +      (void) add_stmt_cost (target_cost_data, 2, cond_branch_not_taken,
> +			    NULL, 0, vect_prologue);
> +      /* FORNOW: Don't attempt to pass individual scalar instructions to
> +	 the model; just assume linear cost for scalar iterations.  */
> +      (void) add_stmt_cost (target_cost_data,
> +			    peel_iters_prologue * scalar_single_iter_cost,
> +			    scalar_stmt, NULL, 0, vect_prologue);
> +      (void) add_stmt_cost (target_cost_data, 
> +			    peel_iters_epilogue * scalar_single_iter_cost,
> +			    scalar_stmt, NULL, 0, vect_epilogue);
>      }
>    else
>      {
> +      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
> +      stmt_info_for_cost *si;
> +      int j;
> +      void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
> +
> +      prologue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
> +      epilogue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
>        peel_iters_prologue = npeel;
> -      vec_outside_cost += vect_get_known_peeling_cost (loop_vinfo,
> -                                    peel_iters_prologue, &peel_iters_epilogue,
> -                                    scalar_single_iter_cost);
> +
> +      (void) vect_get_known_peeling_cost (loop_vinfo, peel_iters_prologue,
> +					  &peel_iters_epilogue,
> +					  scalar_single_iter_cost,
> +					  &prologue_cost_vec,
> +					  &epilogue_cost_vec);
> +
> +      FOR_EACH_VEC_ELT (stmt_info_for_cost, prologue_cost_vec, j, si)
> +	{
> +	  struct _stmt_vec_info *stmt_info
> +	    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
> +	  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
> +				si->misalign, vect_prologue);
> +	}
> +
> +      FOR_EACH_VEC_ELT (stmt_info_for_cost, epilogue_cost_vec, j, si)
> +	{
> +	  struct _stmt_vec_info *stmt_info
> +	    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
> +	  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
> +				si->misalign, vect_epilogue);
> +	}
> +
> +      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
> +      VEC_free (stmt_info_for_cost, heap, epilogue_cost_vec);
>      }
>  
>    /* FORNOW: The scalar outside cost is incremented in one of the
> @@ -2708,14 +2707,11 @@ vect_estimate_min_profitable_iters (loop_vec_info
>  	}
>      }
>  
> -  /* Add SLP costs.  */
> -  slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
> -  FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
> -    vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
> +  /* Complete the target-specific cost calculations.  */
> +  finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), &vec_prologue_cost,
> +	       &vec_inside_cost, &vec_epilogue_cost);
>  
> -  /* Complete the target-specific cost calculation for the inside-of-loop
> -     costs.  */
> -  vec_inside_cost = finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
> +  vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
>    
>    /* Calculate number of iterations required to make the vector version
>       profitable, relative to the loop bodies only.  The following condition
> @@ -2727,7 +2723,7 @@ vect_estimate_min_profitable_iters (loop_vec_info
>       PL_ITERS = prologue iterations, EP_ITERS= epilogue iterations
>       SOC = scalar outside cost for run time cost model check.  */
>  
> -  if ((scalar_single_iter_cost * vf) > vec_inside_cost)
> +  if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
>      {
>        if (vec_outside_cost <= 0)
>          min_profitable_iters = 1;
> @@ -2740,8 +2736,8 @@ vect_estimate_min_profitable_iters (loop_vec_info
>                                      - vec_inside_cost);
>  
>            if ((scalar_single_iter_cost * vf * min_profitable_iters)
> -              <= ((vec_inside_cost * min_profitable_iters)
> -                  + ((vec_outside_cost - scalar_outside_cost) * vf)))
> +              <= (((int) vec_inside_cost * min_profitable_iters)
> +                  + (((int) vec_outside_cost - scalar_outside_cost) * vf)))
>              min_profitable_iters++;
>          }
>      }
> @@ -2761,8 +2757,10 @@ vect_estimate_min_profitable_iters (loop_vec_info
>        fprintf (vect_dump, "Cost model analysis: \n");
>        fprintf (vect_dump, "  Vector inside of loop cost: %d\n",
>  	       vec_inside_cost);
> -      fprintf (vect_dump, "  Vector outside of loop cost: %d\n",
> -	       vec_outside_cost);
> +      fprintf (vect_dump, "  Vector prologue cost: %d\n",
> +	       vec_prologue_cost);
> +      fprintf (vect_dump, "  Vector epilogue cost: %d\n",
> +	       vec_epilogue_cost);
>        fprintf (vect_dump, "  Scalar iteration cost: %d\n",
>  	       scalar_single_iter_cost);
>        fprintf (vect_dump, "  Scalar outside cost: %d\n", scalar_outside_cost);
> @@ -2803,7 +2801,7 @@ static bool
>  vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
>  			   int ncopies)
>  {
> -  int outer_cost = 0;
> +  int prologue_cost = 0, epilogue_cost = 0;
>    enum tree_code code;
>    optab optab;
>    tree vectype;
> @@ -2812,12 +2810,11 @@ vect_model_reduction_cost (stmt_vec_info stmt_info
>    enum machine_mode mode;
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
>  
>    /* Cost of reduction op inside loop.  */
> -  unsigned inside_cost
> -    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> -		     ncopies, vector_stmt, stmt_info, 0);
> -
> +  unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt,
> +					stmt_info, 0, vect_body);
>    stmt = STMT_VINFO_STMT (stmt_info);
>  
>    switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
> @@ -2859,7 +2856,8 @@ vect_model_reduction_cost (stmt_vec_info stmt_info
>    code = gimple_assign_rhs_code (orig_stmt);
>  
>    /* Add in cost for initial definition.  */
> -  outer_cost += vect_get_stmt_cost (scalar_to_vec);
> +  prologue_cost += add_stmt_cost (target_cost_data, 1, scalar_to_vec,
> +				  stmt_info, 0, vect_prologue);
>  
>    /* Determine cost of epilogue code.
>  
> @@ -2869,8 +2867,12 @@ vect_model_reduction_cost (stmt_vec_info stmt_info
>    if (!nested_in_vect_loop_p (loop, orig_stmt))
>      {
>        if (reduc_code != ERROR_MARK)
> -	outer_cost += vect_get_stmt_cost (vector_stmt) 
> -                      + vect_get_stmt_cost (vec_to_scalar); 
> +	{
> +	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
> +					  stmt_info, 0, vect_epilogue);
> +	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vec_to_scalar,
> +					  stmt_info, 0, vect_epilogue);
> +	}
>        else
>  	{
>  	  int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
> @@ -2885,25 +2887,31 @@ vect_model_reduction_cost (stmt_vec_info stmt_info
>  	  if (VECTOR_MODE_P (mode)
>  	      && optab_handler (optab, mode) != CODE_FOR_nothing
>  	      && optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
> -	    /* Final reduction via vector shifts and the reduction operator. Also
> -	       requires scalar extract.  */
> -	    outer_cost += ((exact_log2(nelements) * 2) 
> -              * vect_get_stmt_cost (vector_stmt) 
> -  	      + vect_get_stmt_cost (vec_to_scalar));
> +	    {
> +	      /* Final reduction via vector shifts and the reduction operator.
> +		 Also requires scalar extract.  */
> +	      epilogue_cost += add_stmt_cost (target_cost_data,
> +					      exact_log2 (nelements) * 2,
> +					      vector_stmt, stmt_info, 0,
> +					      vect_epilogue);
> +	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
> +					      vec_to_scalar, stmt_info, 0,
> +					      vect_epilogue);
> +	    }	  
>  	  else
> -	    /* Use extracts and reduction op for final reduction.  For N elements,
> -               we have N extracts and N-1 reduction ops.  */
> -	    outer_cost += ((nelements + nelements - 1) 
> -              * vect_get_stmt_cost (vector_stmt));
> +	    /* Use extracts and reduction op for final reduction.  For N
> +	       elements, we have N extracts and N-1 reduction ops.  */
> +	    epilogue_cost += add_stmt_cost (target_cost_data, 
> +					    nelements + nelements - 1,
> +					    vector_stmt, stmt_info, 0,
> +					    vect_epilogue);
>  	}
>      }
>  
> -  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
> -
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_reduction_cost: inside_cost = %d, "
> -             "outside_cost = %d .", inside_cost,
> -             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
> +             "prologue_cost = %d, epilogue_cost = %d .", inside_cost,
> +	     prologue_cost, epilogue_cost);
>  
>    return true;
>  }
> @@ -2917,20 +2925,20 @@ static void
>  vect_model_induction_cost (stmt_vec_info stmt_info, int ncopies)
>  {
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
> +  unsigned inside_cost, prologue_cost;
>  
>    /* loop cost for vec_loop.  */
> -  unsigned inside_cost
> -    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), ncopies,
> -		     vector_stmt, stmt_info, 0);
> +  inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt,
> +			       stmt_info, 0, vect_body);
>  
>    /* prologue cost for vec_init and vec_step.  */
> -  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)  
> -    = 2 * vect_get_stmt_cost (scalar_to_vec);
> +  prologue_cost = add_stmt_cost (target_cost_data, 2, scalar_to_vec,
> +				 stmt_info, 0, vect_prologue);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_induction_cost: inside_cost = %d, "
> -             "outside_cost = %d .", inside_cost,
> -             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
> +             "prologue_cost = %d .", inside_cost, prologue_cost);
>  }
>  
>  
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c	(revision 189574)
> +++ gcc/tree-vect-data-refs.c	(working copy)
> @@ -1204,10 +1204,11 @@ vector_alignment_reachable_p (struct data_referenc
>  
>  /* Calculate the cost of the memory access represented by DR.  */
>  
> -static stmt_vector_for_cost
> +static void
>  vect_get_data_access_cost (struct data_reference *dr,
>                             unsigned int *inside_cost,
> -                           unsigned int *outside_cost)
> +                           unsigned int *outside_cost,
> +			   stmt_vector_for_cost *body_cost_vec)
>  {
>    gimple stmt = DR_STMT (dr);
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> @@ -1215,19 +1216,16 @@ vect_get_data_access_cost (struct data_reference *
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>    int ncopies = vf / nunits;
> -  stmt_vector_for_cost stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
>  
>    if (DR_IS_READ (dr))
> -    vect_get_load_cost (dr, ncopies, true, inside_cost,
> -			outside_cost, &stmt_cost_vec);
> +    vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,
> +			NULL, body_cost_vec, false);
>    else
> -    vect_get_store_cost (dr, ncopies, inside_cost, &stmt_cost_vec);
> +    vect_get_store_cost (dr, ncopies, inside_cost, body_cost_vec);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
>               "outside_cost = %d.", *inside_cost, *outside_cost);
> -
> -  return stmt_cost_vec;
>  }
>  
>  
> @@ -1320,8 +1318,13 @@ vect_peeling_hash_get_lowest_cost (void **slot, vo
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    VEC (data_reference_p, heap) *datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
>    struct data_reference *dr;
> -  stmt_vector_for_cost stmt_cost_vec = NULL;
> +  stmt_vector_for_cost prologue_cost_vec, body_cost_vec, epilogue_cost_vec;
> +  int single_iter_cost;
>  
> +  prologue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
> +  body_cost_vec     = VEC_alloc (stmt_info_for_cost, heap, 2);
> +  epilogue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
> +
>    FOR_EACH_VEC_ELT (data_reference_p, datarefs, i, dr)
>      {
>        stmt = DR_STMT (dr);
> @@ -1334,23 +1337,35 @@ vect_peeling_hash_get_lowest_cost (void **slot, vo
>  
>        save_misalignment = DR_MISALIGNMENT (dr);
>        vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
> -      stmt_cost_vec = vect_get_data_access_cost (dr, &inside_cost,
> -						 &outside_cost);
> +      vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
> +				 &body_cost_vec);
>        SET_DR_MISALIGNMENT (dr, save_misalignment);
>      }
>  
> -  outside_cost += vect_get_known_peeling_cost (loop_vinfo, elem->npeel, &dummy,
> -                         vect_get_single_scalar_iteration_cost (loop_vinfo));
> +  single_iter_cost = vect_get_single_scalar_iteration_cost (loop_vinfo);
> +  outside_cost += vect_get_known_peeling_cost (loop_vinfo, elem->npeel,
> +					       &dummy, single_iter_cost,
> +					       &prologue_cost_vec,
> +					       &epilogue_cost_vec);
>  
> +  /* Prologue and epilogue costs are added to the target model later.
> +     These costs depend only on the scalar iteration cost, the
> +     number of peeling iterations finally chosen, and the number of
> +     misaligned statements.  So discard the information found here.  */
> +  VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
> +  VEC_free (stmt_info_for_cost, heap, epilogue_cost_vec);
> +
>    if (inside_cost < min->inside_cost
>        || (inside_cost == min->inside_cost && outside_cost < min->outside_cost))
>      {
>        min->inside_cost = inside_cost;
>        min->outside_cost = outside_cost;
> -      min->stmt_cost_vec = stmt_cost_vec;
> +      min->body_cost_vec = body_cost_vec;
>        min->peel_info.dr = elem->dr;
>        min->peel_info.npeel = elem->npeel;
>      }
> +  else
> +    VEC_free (stmt_info_for_cost, heap, body_cost_vec);
>  
>    return 1;
>  }
> @@ -1363,12 +1378,12 @@ vect_peeling_hash_get_lowest_cost (void **slot, vo
>  static struct data_reference *
>  vect_peeling_hash_choose_best_peeling (loop_vec_info loop_vinfo,
>                                         unsigned int *npeel,
> -				       stmt_vector_for_cost *stmt_cost_vec)
> +				       stmt_vector_for_cost *body_cost_vec)
>  {
>     struct _vect_peel_extended_info res;
>  
>     res.peel_info.dr = NULL;
> -   res.stmt_cost_vec = NULL;
> +   res.body_cost_vec = NULL;
>  
>     if (flag_vect_cost_model)
>       {
> @@ -1385,7 +1400,7 @@ vect_peeling_hash_choose_best_peeling (loop_vec_in
>       }
>  
>     *npeel = res.peel_info.npeel;
> -   *stmt_cost_vec = res.stmt_cost_vec;
> +   *body_cost_vec = res.body_cost_vec;
>     return res.peel_info.dr;
>  }
>  
> @@ -1502,7 +1517,7 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>    unsigned possible_npeel_number = 1;
>    tree vectype;
>    unsigned int nelements, mis, same_align_drs_max = 0;
> -  stmt_vector_for_cost stmt_cost_vec = NULL;
> +  stmt_vector_for_cost body_cost_vec = NULL;
>  
>    if (vect_print_dump_info (REPORT_DETAILS))
>      fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
> @@ -1706,12 +1721,15 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>            unsigned int store_inside_cost = 0, store_outside_cost = 0;
>            unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
>            unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
> +	  stmt_vector_for_cost dummy = VEC_alloc (stmt_info_for_cost, heap, 2);
>  
> -          (void) vect_get_data_access_cost (dr0, &load_inside_cost,
> -					    &load_outside_cost);
> -          (void) vect_get_data_access_cost (first_store, &store_inside_cost,
> -					    &store_outside_cost);
> +          vect_get_data_access_cost (dr0, &load_inside_cost, &load_outside_cost,
> +				     &dummy);
> +          vect_get_data_access_cost (first_store, &store_inside_cost,
> +				     &store_outside_cost, &dummy);
>  
> +	  VEC_free (stmt_info_for_cost, heap, dummy);
> +
>            /* Calculate the penalty for leaving FIRST_STORE unaligned (by
>               aligning the load DR0).  */
>            load_inside_penalty = store_inside_cost;
> @@ -1775,7 +1793,7 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>  
>        /* Choose the best peeling from the hash table.  */
>        dr0 = vect_peeling_hash_choose_best_peeling (loop_vinfo, &npeel,
> -						   &stmt_cost_vec);
> +						   &body_cost_vec);
>        if (!dr0 || !npeel)
>          do_peeling = false;
>      }
> @@ -1860,6 +1878,7 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>        if (do_peeling)
>          {
>  	  stmt_info_for_cost *si;
> +	  void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
>  
>            /* (1.2) Update the DR_MISALIGNMENT of each data reference DR_i.
>               If the misalignment of DR_i is identical to that of dr0 then set
> @@ -1887,13 +1906,16 @@ vect_enhance_data_refs_alignment (loop_vec_info lo
>  	  /* We've delayed passing the inside-loop peeling costs to the
>  	     target cost model until we were sure peeling would happen.
>  	     Do so now.  */
> -	  if (stmt_cost_vec)
> +	  if (body_cost_vec)
>  	    {
> -	      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, i, si)
> -		(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> -				      si->count, si->kind,
> -				      vinfo_for_stmt (si->stmt), si->misalign);
> -	      VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
> +	      FOR_EACH_VEC_ELT (stmt_info_for_cost, body_cost_vec, i, si)
> +		{
> +		  struct _stmt_vec_info *stmt_info
> +		    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
> +		  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
> +					si->misalign, vect_body);
> +		}
> +	      VEC_free (stmt_info_for_cost, heap, body_cost_vec);
>  	    }
>  
>  	  stat = vect_verify_datarefs_alignment (loop_vinfo, NULL);
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c	(revision 189574)
> +++ gcc/tree-vect-stmts.c	(working copy)
> @@ -72,18 +72,18 @@ stmt_in_inner_loop_p (struct _stmt_vec_info *stmt_
>     Return a preliminary estimate of the statement's cost.  */
>  
>  unsigned
> -record_stmt_cost (stmt_vector_for_cost *stmt_cost_vec, int count,
> +record_stmt_cost (stmt_vector_for_cost *body_cost_vec, int count,
>  		  enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
> -		  int misalign)
> +		  int misalign, enum vect_cost_model_location where)
>  {
> -  if (stmt_cost_vec)
> +  if (body_cost_vec)
>      {
> -      tree vectype = stmt_vectype (stmt_info);
> -      add_stmt_info_to_vec (stmt_cost_vec, count, kind,
> -			    STMT_VINFO_STMT (stmt_info), misalign);
> +      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
> +      add_stmt_info_to_vec (body_cost_vec, count, kind,
> +			    stmt_info ? STMT_VINFO_STMT (stmt_info) : NULL,
> +			    misalign);
>        return (unsigned)
> -	(targetm.vectorize.builtin_vectorization_cost (kind, vectype, misalign)
> -	 * count);
> +	(builtin_vectorization_cost (kind, vectype, misalign) * count);
>  	 
>      }
>    else
> @@ -97,7 +97,8 @@ unsigned
>        else
>  	target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
>  
> -      return add_stmt_cost (target_cost_data, count, kind, stmt_info, misalign);
> +      return add_stmt_cost (target_cost_data, count, kind, stmt_info,
> +			    misalign, where);
>      }
>  }
>  
> @@ -795,11 +796,12 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info lo
>  
>  void
>  vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies,
> -			enum vect_def_type *dt, slp_tree slp_node,
> -			stmt_vector_for_cost *stmt_cost_vec)
> +			enum vect_def_type *dt,
> +			stmt_vector_for_cost *prologue_cost_vec,
> +			stmt_vector_for_cost *body_cost_vec)
>  {
>    int i;
> -  int inside_cost = 0, outside_cost = 0;
> +  int inside_cost = 0, prologue_cost = 0;
>  
>    /* The SLP costs were already calculated during SLP tree build.  */
>    if (PURE_SLP_STMT (stmt_info))
> @@ -807,21 +809,17 @@ vect_model_simple_cost (stmt_vec_info stmt_info, i
>  
>    /* FORNOW: Assuming maximum 2 args per stmts.  */
>    for (i = 0; i < 2; i++)
> -    {
> -      if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
> -	outside_cost += vect_get_stmt_cost (vector_stmt); 
> -    }
> +    if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
> +      prologue_cost += record_stmt_cost (prologue_cost_vec, 1, vector_stmt,
> +					 stmt_info, 0, vect_prologue);
>  
> -  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
> -  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
> -
>    /* Pass the inside-of-loop statements to the target-specific cost model.  */
> -  inside_cost = record_stmt_cost (stmt_cost_vec, ncopies, vector_stmt,
> -				  stmt_info, 0);
> +  inside_cost = record_stmt_cost (body_cost_vec, ncopies, vector_stmt,
> +				  stmt_info, 0, vect_body);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
> -             "outside_cost = %d .", inside_cost, outside_cost);
> +             "prologue_cost = %d .", inside_cost, prologue_cost);
>  }
>  
>  
> @@ -835,7 +833,7 @@ vect_model_promotion_demotion_cost (stmt_vec_info
>  				    enum vect_def_type *dt, int pwr)
>  {
>    int i, tmp;
> -  int inside_cost = 0, outside_cost = 0;
> +  int inside_cost = 0, prologue_cost = 0;
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>    void *target_cost_data;
> @@ -854,22 +852,19 @@ vect_model_promotion_demotion_cost (stmt_vec_info
>        tmp = (STMT_VINFO_TYPE (stmt_info) == type_promotion_vec_info_type) ?
>  	(i + 1) : i;
>        inside_cost += add_stmt_cost (target_cost_data, vect_pow2 (tmp),
> -				    vec_promote_demote, stmt_info, 0);
> +				    vec_promote_demote, stmt_info, 0,
> +				    vect_body);
>      }
>  
>    /* FORNOW: Assuming maximum 2 args per stmts.  */
>    for (i = 0; i < 2; i++)
> -    {
> -      if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
> -        outside_cost += vect_get_stmt_cost (vector_stmt);
> -    }
> +    if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
> +      prologue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
> +				      stmt_info, 0, vect_prologue);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_promotion_demotion_cost: inside_cost = %d, "
> -             "outside_cost = %d .", inside_cost, outside_cost);
> -
> -  /* Set the costs in STMT_INFO.  */
> -  stmt_vinfo_set_outside_of_loop_cost (stmt_info, NULL, outside_cost);
> +             "prologue_cost = %d .", inside_cost, prologue_cost);
>  }
>  
>  /* Function vect_cost_group_size
> @@ -898,10 +893,12 @@ vect_cost_group_size (stmt_vec_info stmt_info)
>  void
>  vect_model_store_cost (stmt_vec_info stmt_info, int ncopies,
>  		       bool store_lanes_p, enum vect_def_type dt,
> -		       slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
> +		       slp_tree slp_node,
> +		       stmt_vector_for_cost *prologue_cost_vec,
> +		       stmt_vector_for_cost *body_cost_vec)
>  {
>    int group_size;
> -  unsigned int inside_cost = 0, outside_cost = 0;
> +  unsigned int inside_cost = 0, prologue_cost = 0;
>    struct data_reference *first_dr;
>    gimple first_stmt;
>  
> @@ -910,7 +907,8 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>      return;
>  
>    if (dt == vect_constant_def || dt == vect_external_def)
> -    outside_cost = vect_get_stmt_cost (scalar_to_vec); 
> +    prologue_cost += record_stmt_cost (prologue_cost_vec, 1, scalar_to_vec,
> +				       stmt_info, 0, vect_prologue);
>  
>    /* Grouped access?  */
>    if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> @@ -944,8 +942,8 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>        /* Uses a high and low interleave operation for each needed permute.  */
>        
>        int nstmts = ncopies * exact_log2 (group_size) * group_size;
> -      inside_cost = record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
> -				      stmt_info, 0);
> +      inside_cost = record_stmt_cost (body_cost_vec, nstmts, vec_perm,
> +				      stmt_info, 0, vect_body);
>  
>        if (vect_print_dump_info (REPORT_COST))
>          fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
> @@ -953,14 +951,11 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>      }
>  
>    /* Costs of the stores.  */
> -  vect_get_store_cost (first_dr, ncopies, &inside_cost, stmt_cost_vec);
> +  vect_get_store_cost (first_dr, ncopies, &inside_cost, body_cost_vec);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
> -             "outside_cost = %d .", inside_cost, outside_cost);
> -
> -  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
> -  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
> +             "prologue_cost = %d .", inside_cost, prologue_cost);
>  }
>  
>  
> @@ -968,7 +963,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>  void
>  vect_get_store_cost (struct data_reference *dr, int ncopies,
>  		     unsigned int *inside_cost,
> -		     stmt_vector_for_cost *stmt_cost_vec)
> +		     stmt_vector_for_cost *body_cost_vec)
>  {
>    int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
>    gimple stmt = DR_STMT (dr);
> @@ -978,8 +973,9 @@ vect_get_store_cost (struct data_reference *dr, in
>      {
>      case dr_aligned:
>        {
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> -					  vector_store, stmt_info, 0);
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> +					  vector_store, stmt_info, 0,
> +					  vect_body);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_store_cost: aligned.");
> @@ -990,9 +986,9 @@ vect_get_store_cost (struct data_reference *dr, in
>      case dr_unaligned_supported:
>        {
>          /* Here, we assign an additional cost for the unaligned store.  */
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
>  					  unaligned_store, stmt_info,
> -					  DR_MISALIGNMENT (dr));
> +					  DR_MISALIGNMENT (dr), vect_body);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_store_cost: unaligned supported by "
> @@ -1025,13 +1021,15 @@ vect_get_store_cost (struct data_reference *dr, in
>     access scheme chosen.  */
>  
>  void
> -vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, bool load_lanes_p,
> -		      slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
> +vect_model_load_cost (stmt_vec_info stmt_info, int ncopies,
> +		      bool load_lanes_p, slp_tree slp_node,
> +		      stmt_vector_for_cost *prologue_cost_vec,
> +		      stmt_vector_for_cost *body_cost_vec)
>  {
>    int group_size;
>    gimple first_stmt;
>    struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
> -  unsigned int inside_cost = 0, outside_cost = 0;
> +  unsigned int inside_cost = 0, prologue_cost = 0;
>  
>    /* The SLP costs were already calculated during SLP tree build.  */
>    if (PURE_SLP_STMT (stmt_info))
> @@ -1059,8 +1057,8 @@ void
>      {
>        /* Uses an even and odd extract operations for each needed permute.  */
>        int nstmts = ncopies * exact_log2 (group_size) * group_size;
> -      inside_cost += record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
> -				       stmt_info, 0);
> +      inside_cost += record_stmt_cost (body_cost_vec, nstmts, vec_perm,
> +				       stmt_info, 0, vect_body);
>  
>        if (vect_print_dump_info (REPORT_COST))
>          fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
> @@ -1072,24 +1070,22 @@ void
>      {
>        /* N scalar loads plus gathering them into a vector.  */
>        tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> -      inside_cost += record_stmt_cost (stmt_cost_vec,
> +      inside_cost += record_stmt_cost (body_cost_vec,
>  				       ncopies * TYPE_VECTOR_SUBPARTS (vectype),
> -				       scalar_load, stmt_info, 0);
> -      inside_cost += record_stmt_cost (stmt_cost_vec, ncopies, vec_construct,
> -				       stmt_info, 0);
> +				       scalar_load, stmt_info, 0, vect_body);
> +      inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
> +				       stmt_info, 0, vect_body);
>      }
>    else
>      vect_get_load_cost (first_dr, ncopies,
>  			((!STMT_VINFO_GROUPED_ACCESS (stmt_info))
>  			 || group_size > 1 || slp_node),
> -			&inside_cost, &outside_cost, stmt_cost_vec);
> +			&inside_cost, &prologue_cost, 
> +			prologue_cost_vec, body_cost_vec, true);
>  
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
> -             "outside_cost = %d .", inside_cost, outside_cost);
> -
> -  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
> -  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
> +             "prologue_cost = %d .", inside_cost, prologue_cost);
>  }
>  
>  
> @@ -1097,8 +1093,10 @@ void
>  void
>  vect_get_load_cost (struct data_reference *dr, int ncopies,
>  		    bool add_realign_cost, unsigned int *inside_cost,
> -		    unsigned int *outside_cost,
> -		    stmt_vector_for_cost *stmt_cost_vec)
> +		    unsigned int *prologue_cost,
> +		    stmt_vector_for_cost *prologue_cost_vec,
> +		    stmt_vector_for_cost *body_cost_vec,
> +		    bool record_prologue_costs)
>  {
>    int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
>    gimple stmt = DR_STMT (dr);
> @@ -1108,8 +1106,8 @@ vect_get_load_cost (struct data_reference *dr, int
>      {
>      case dr_aligned:
>        {
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> -					  vector_load, stmt_info, 0);
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
> +					  stmt_info, 0, vect_body);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_load_cost: aligned.");
> @@ -1119,9 +1117,9 @@ vect_get_load_cost (struct data_reference *dr, int
>      case dr_unaligned_supported:
>        {
>          /* Here, we assign an additional cost for the unaligned load.  */
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
>  					  unaligned_load, stmt_info,
> -					  DR_MISALIGNMENT (dr));
> +					  DR_MISALIGNMENT (dr), vect_body);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_load_cost: unaligned supported by "
> @@ -1131,17 +1129,17 @@ vect_get_load_cost (struct data_reference *dr, int
>        }
>      case dr_explicit_realign:
>        {
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies * 2,
> -					  vector_load, stmt_info, 0);
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> -					  vec_perm, stmt_info, 0);
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies * 2,
> +					  vector_load, stmt_info, 0, vect_body);
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> +					  vec_perm, stmt_info, 0, vect_body);
>  
>          /* FIXME: If the misalignment remains fixed across the iterations of
>             the containing loop, the following cost should be added to the
> -           outside costs.  */
> +           prologue costs.  */
>          if (targetm.vectorize.builtin_mask_for_load)
> -	  *inside_cost += record_stmt_cost (stmt_cost_vec, 1, vector_stmt,
> -					    stmt_info, 0);
> +	  *inside_cost += record_stmt_cost (body_cost_vec, 1, vector_stmt,
> +					    stmt_info, 0, vect_body);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump, "vect_model_load_cost: explicit realign");
> @@ -1161,17 +1159,21 @@ vect_get_load_cost (struct data_reference *dr, int
>             access in the group.  Inside the loop, there is a load op
>             and a realignment op.  */
>  
> -        if (add_realign_cost)
> +        if (add_realign_cost && record_prologue_costs)
>            {
> -            *outside_cost = 2 * vect_get_stmt_cost (vector_stmt);
> +	    *prologue_cost += record_stmt_cost (prologue_cost_vec, 2,
> +						vector_stmt, stmt_info,
> +						0, vect_prologue);
>              if (targetm.vectorize.builtin_mask_for_load)
> -              *outside_cost += vect_get_stmt_cost (vector_stmt);
> +	      *prologue_cost += record_stmt_cost (prologue_cost_vec, 1,
> +						  vector_stmt, stmt_info,
> +						  0, vect_prologue);
>            }
>  
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> -					  vector_load, stmt_info, 0);
> -	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
> -					  vec_perm, stmt_info, 0);
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
> +					  stmt_info, 0, vect_body);
> +	*inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_perm,
> +					  stmt_info, 0, vect_body);
>  
>          if (vect_print_dump_info (REPORT_COST))
>            fprintf (vect_dump,
> @@ -3879,7 +3881,8 @@ vectorizable_store (gimple stmt, gimple_stmt_itera
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
> -      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL, NULL);
> +      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt,
> +			     NULL, NULL, NULL);
>        return true;
>      }
>  
> @@ -4435,7 +4438,7 @@ vectorizable_load (gimple stmt, gimple_stmt_iterat
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
> -      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL, NULL);
> +      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL, NULL, NULL);
>        return true;
>      }
>  
> @@ -5875,7 +5878,6 @@ new_stmt_vec_info (gimple stmt, loop_vec_info loop
>      STMT_VINFO_DEF_TYPE (res) = vect_internal_def;
>  
>    STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
> -  STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
>    STMT_SLP_TYPE (res) = loop_vect;
>    GROUP_FIRST_ELEMENT (res) = NULL;
>    GROUP_NEXT_ELEMENT (res) = NULL;
> Index: gcc/config/spu/spu.c
> ===================================================================
> --- gcc/config/spu/spu.c	(revision 189574)
> +++ gcc/config/spu/spu.c	(working copy)
> @@ -6623,8 +6623,8 @@ spu_builtin_vectorization_cost (enum vect_cost_for
>  static void *
>  spu_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
>  {
> -  unsigned *cost = XNEW (unsigned);
> -  *cost = 0;
> +  unsigned *cost = XNEWVEC (unsigned, 3);
> +  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
>    return cost;
>  }
>  
> @@ -6632,24 +6632,25 @@ spu_init_cost (struct loop *loop_info ATTRIBUTE_UN
>  
>  static unsigned
>  spu_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> -		   struct _stmt_vec_info *stmt_info, int misalign)
> +		   struct _stmt_vec_info *stmt_info, int misalign,
> +		   enum vect_cost_model_location where)
>  {
>    unsigned *cost = (unsigned *) data;
>    unsigned retval = 0;
>  
>    if (flag_vect_cost_model)
>      {
> -      tree vectype = stmt_vectype (stmt_info);
> +      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>        int stmt_cost = spu_builtin_vectorization_cost (kind, vectype, misalign);
>  
>        /* Statements in an inner loop relative to the loop being
>  	 vectorized are weighted more heavily.  The value here is
>  	 arbitrary and could potentially be improved with analysis.  */
> -      if (stmt_in_inner_loop_p (stmt_info))
> +      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
>  	count *= 50;  /* FIXME.  */
>  
>        retval = (unsigned) (count * stmt_cost);
> -      *cost += retval;
> +      cost[where] += retval;
>      }
>  
>    return retval;
> @@ -6657,10 +6658,14 @@ spu_add_stmt_cost (void *data, int count, enum vec
>  
>  /* Implement targetm.vectorize.finish_cost.  */
>  
> -static unsigned
> -spu_finish_cost (void *data)
> +static void
> +spu_finish_cost (void *data, unsigned *prologue_cost,
> +		 unsigned *body_cost, unsigned *epilogue_cost)
>  {
> -  return *((unsigned *) data);
> +  unsigned *cost = (unsigned *) data;
> +  *prologue_cost = cost[vect_prologue];
> +  *body_cost     = cost[vect_body];
> +  *epilogue_cost = cost[vect_epilogue];
>  }
>  
>  /* Implement targetm.vectorize.destroy_cost_data.  */
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c	(revision 189574)
> +++ gcc/config/i386/i386.c	(working copy)
> @@ -40070,8 +40070,8 @@ ix86_autovectorize_vector_sizes (void)
>  static void *
>  ix86_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
>  {
> -  unsigned *cost = XNEW (unsigned);
> -  *cost = 0;
> +  unsigned *cost = XNEWVEC (unsigned, 3);
> +  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
>    return cost;
>  }
>  
> @@ -40079,24 +40079,25 @@ ix86_init_cost (struct loop *loop_info ATTRIBUTE_U
>  
>  static unsigned
>  ix86_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> -		    struct _stmt_vec_info *stmt_info, int misalign)
> +		    struct _stmt_vec_info *stmt_info, int misalign,
> +		    enum vect_cost_model_location where)
>  {
>    unsigned *cost = (unsigned *) data;
>    unsigned retval = 0;
>  
>    if (flag_vect_cost_model)
>      {
> -      tree vectype = stmt_vectype (stmt_info);
> +      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>        int stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
>  
>        /* Statements in an inner loop relative to the loop being
>  	 vectorized are weighted more heavily.  The value here is
>  	 arbitrary and could potentially be improved with analysis.  */
> -      if (stmt_in_inner_loop_p (stmt_info))
> +      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
>  	count *= 50;  /* FIXME.  */
>  
>        retval = (unsigned) (count * stmt_cost);
> -      *cost += retval;
> +      cost[where] += retval;
>      }
>  
>    return retval;
> @@ -40104,10 +40105,14 @@ ix86_add_stmt_cost (void *data, int count, enum ve
>  
>  /* Implement targetm.vectorize.finish_cost.  */
>  
> -static unsigned
> -ix86_finish_cost (void *data)
> +static void
> +ix86_finish_cost (void *data, unsigned *prologue_cost,
> +		  unsigned *body_cost, unsigned *epilogue_cost)
>  {
> -  return *((unsigned *) data);
> +  unsigned *cost = (unsigned *) data;
> +  *prologue_cost = cost[vect_prologue];
> +  *body_cost     = cost[vect_body];
> +  *epilogue_cost = cost[vect_epilogue];
>  }
>  
>  /* Implement targetm.vectorize.destroy_cost_data.  */
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 189574)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -3525,8 +3525,8 @@ rs6000_preferred_simd_mode (enum machine_mode mode
>  static void *
>  rs6000_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
>  {
> -  unsigned *cost = XNEW (unsigned);
> -  *cost = 0;
> +  unsigned *cost = XNEWVEC (unsigned, 3);
> +  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
>    return cost;
>  }
>  
> @@ -3534,24 +3534,25 @@ rs6000_init_cost (struct loop *loop_info ATTRIBUTE
>  
>  static unsigned
>  rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
> -		      struct _stmt_vec_info *stmt_info, int misalign)
> +		      struct _stmt_vec_info *stmt_info, int misalign,
> +		      enum vect_cost_model_location where)
>  {
>    unsigned *cost = (unsigned *) data;
>    unsigned retval = 0;
>  
>    if (flag_vect_cost_model)
>      {
> -      tree vectype = stmt_vectype (stmt_info);
> +      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>        int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype,
>  							 misalign);
>        /* Statements in an inner loop relative to the loop being
>  	 vectorized are weighted more heavily.  The value here is
>  	 arbitrary and could potentially be improved with analysis.  */
> -      if (stmt_in_inner_loop_p (stmt_info))
> +      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
>  	count *= 50;  /* FIXME.  */
>  
>        retval = (unsigned) (count * stmt_cost);
> -      *cost += retval;
> +      cost[where] += retval;
>      }
>  
>    return retval;
> @@ -3559,10 +3560,14 @@ rs6000_add_stmt_cost (void *data, int count, enum
>  
>  /* Implement targetm.vectorize.finish_cost.  */
>  
> -static unsigned
> -rs6000_finish_cost (void *data)
> +static void
> +rs6000_finish_cost (void *data, unsigned *prologue_cost,
> +		    unsigned *body_cost, unsigned *epilogue_cost)
>  {
> -  return *((unsigned *) data);
> +  unsigned *cost = (unsigned *) data;
> +  *prologue_cost = cost[vect_prologue];
> +  *body_cost     = cost[vect_body];
> +  *epilogue_cost = cost[vect_epilogue];
>  }
>  
>  /* Implement targetm.vectorize.destroy_cost_data.  */
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c	(revision 189574)
> +++ gcc/tree-vect-slp.c	(working copy)
> @@ -93,7 +93,7 @@ vect_free_slp_instance (slp_instance instance)
>    vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
>    VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (instance));
>    VEC_free (slp_tree, heap, SLP_INSTANCE_LOADS (instance));
> -  VEC_free (stmt_info_for_cost, heap, SLP_INSTANCE_STMT_COST_VEC (instance));
> +  VEC_free (stmt_info_for_cost, heap, SLP_INSTANCE_BODY_COST_VEC (instance));
>  }
>  
>  
> @@ -121,7 +121,6 @@ vect_create_new_slp_node (VEC (gimple, heap) *scal
>    SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
>    SLP_TREE_VEC_STMTS (node) = NULL;
>    SLP_TREE_CHILDREN (node) = VEC_alloc (slp_void_p, heap, nops);
> -  SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0;
>  
>    return node;
>  }
> @@ -179,7 +178,8 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vi
>                               slp_tree slp_node, gimple stmt,
>  			     int ncopies_for_cost, bool first,
>                               VEC (slp_oprnd_info, heap) **oprnds_info,
> -			     stmt_vector_for_cost *stmt_cost_vec)
> +			     stmt_vector_for_cost *prologue_cost_vec,
> +			     stmt_vector_for_cost *body_cost_vec)
>  {
>    tree oprnd;
>    unsigned int i, number_of_oprnds;
> @@ -320,7 +320,8 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vi
>  	      if (REFERENCE_CLASS_P (lhs))
>  		/* Store.  */
>                  vect_model_store_cost (stmt_info, ncopies_for_cost, false,
> -				       dt, slp_node, stmt_cost_vec);
> +				       dt, slp_node, prologue_cost_vec,
> +				       body_cost_vec);
>  	      else
>  		{
>  		  enum vect_def_type dts[2];
> @@ -329,7 +330,7 @@ vect_get_and_check_slp_defs (loop_vec_info loop_vi
>  		  /* Not memory operation (we don't call this function for
>  		     loads).  */
>  		  vect_model_simple_cost (stmt_info, ncopies_for_cost, dts,
> -					  slp_node, stmt_cost_vec);
> +					  prologue_cost_vec, body_cost_vec);
>  		}
>  	    }
>  	}
> @@ -451,7 +452,8 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>                       VEC (int, heap) **load_permutation,
>                       VEC (slp_tree, heap) **loads,
>                       unsigned int vectorization_factor, bool *loads_permuted,
> -		     stmt_vector_for_cost *stmt_cost_vec)
> +		     stmt_vector_for_cost *prologue_cost_vec,
> +		     stmt_vector_for_cost *body_cost_vec)
>  {
>    unsigned int i;
>    VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (*node);
> @@ -712,7 +714,8 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>  	      if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node,
>  						stmt, ncopies_for_cost,
>  						(i == 0), &oprnds_info,
> -						stmt_cost_vec))
> +						prologue_cost_vec,
> +						body_cost_vec))
>  		{
>  	  	  vect_free_oprnd_info (&oprnds_info);
>   		  return false;
> @@ -802,7 +805,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>                    /* Analyze costs (for the first stmt in the group).  */
>                    vect_model_load_cost (vinfo_for_stmt (stmt),
>                                          ncopies_for_cost, false, *node,
> -					stmt_cost_vec);
> +					prologue_cost_vec, body_cost_vec);
>                  }
>  
>                /* Store the place of this load in the interleaving chain.  In
> @@ -876,7 +879,8 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>  	  /* Find the def-stmts.  */
>  	  if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node, stmt,
>  					    ncopies_for_cost, (i == 0),
> -					    &oprnds_info, stmt_cost_vec))
> +					    &oprnds_info, prologue_cost_vec,
> +					    body_cost_vec))
>  	    {
>  	      vect_free_oprnd_info (&oprnds_info);
>  	      return false;
> @@ -884,9 +888,6 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>  	}
>      }
>  
> -  /* Add the costs of the node to the overall instance costs.  */
> -  *outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node);
> -
>    /* Grouped loads were reached - stop the recursion.  */
>    if (stop_recursion)
>      {
> @@ -895,8 +896,8 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>          {
>  	  gimple first_stmt = VEC_index (gimple, stmts, 0);
>            *loads_permuted = true;
> -	  (void) record_stmt_cost (stmt_cost_vec, group_size, vec_perm, 
> -				   vinfo_for_stmt (first_stmt), 0);
> +	  (void) record_stmt_cost (body_cost_vec, group_size, vec_perm, 
> +				   vinfo_for_stmt (first_stmt), 0, vect_body);
>          }
>        else
>          {
> @@ -925,7 +926,7 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
>  				   outside_cost, ncopies_for_cost,
>  				   max_nunits, load_permutation, loads,
>  				   vectorization_factor, loads_permuted,
> -				   stmt_cost_vec))
> +				   prologue_cost_vec, body_cost_vec))
>          {
>  	  if (child)
>  	    oprnd_info->def_stmts = NULL;
> @@ -1470,7 +1471,8 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>    struct data_reference *dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
>    bool loads_permuted = false;
>    VEC (gimple, heap) *scalar_stmts;
> -  stmt_vector_for_cost stmt_cost_vec;
> +  stmt_vector_for_cost body_cost_vec, prologue_cost_vec;
> +  stmt_info_for_cost *si;
>  
>    if (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)))
>      {
> @@ -1556,15 +1558,19 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>  
>    load_permutation = VEC_alloc (int, heap, group_size * group_size);
>    loads = VEC_alloc (slp_tree, heap, group_size);
> -  stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
> +  prologue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
> +  body_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
>  
>    /* Build the tree for the SLP instance.  */
>    if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &node, group_size,
>                             &outside_cost, ncopies_for_cost,
>  			   &max_nunits, &load_permutation, &loads,
>  			   vectorization_factor, &loads_permuted,
> -			   &stmt_cost_vec))
> +			   &prologue_cost_vec, &body_cost_vec))
>      {
> +      void *data = (loop_vinfo ? LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)
> +		    : BB_VINFO_TARGET_COST_DATA (bb_vinfo));
> +
>        /* Calculate the unrolling factor based on the smallest type.  */
>        if (max_nunits > nunits)
>          unrolling_factor = least_common_multiple (max_nunits, group_size)
> @@ -1575,7 +1581,8 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>            if (vect_print_dump_info (REPORT_SLP))
>              fprintf (vect_dump, "Build SLP failed: unrolling required in basic"
>                                 " block SLP");
> -	  VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
> +	  VEC_free (stmt_info_for_cost, heap, body_cost_vec);
> +	  VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
>            return false;
>          }
>  
> @@ -1584,8 +1591,7 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>        SLP_INSTANCE_TREE (new_instance) = node;
>        SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
>        SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
> -      SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost;
> -      SLP_INSTANCE_STMT_COST_VEC (new_instance) = stmt_cost_vec;
> +      SLP_INSTANCE_BODY_COST_VEC (new_instance) = body_cost_vec;
>        SLP_INSTANCE_LOADS (new_instance) = loads;
>        SLP_INSTANCE_FIRST_LOAD_STMT (new_instance) = NULL;
>        SLP_INSTANCE_LOAD_PERMUTATION (new_instance) = load_permutation;
> @@ -1603,6 +1609,7 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>                  }
>  
>                vect_free_slp_instance (new_instance);
> +	      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
>                return false;
>              }
>  
> @@ -1612,6 +1619,19 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>        else
>          VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (new_instance));
>  
> +      /* Record the prologue costs, which were delayed until we were
> +	 sure that SLP was successful.  Unlike the body costs, we know
> +	 the final values now regardless of the loop vectorization factor.  */
> +      FOR_EACH_VEC_ELT (stmt_info_for_cost, prologue_cost_vec, i, si)
> +	{
> +	  struct _stmt_vec_info *stmt_info
> +	    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
> +	  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
> +				si->misalign, vect_prologue);
> +	}
> +
> +      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
> +
>        if (loop_vinfo)
>          VEC_safe_push (slp_instance, heap,
>                         LOOP_VINFO_SLP_INSTANCES (loop_vinfo),
> @@ -1626,7 +1646,10 @@ vect_analyze_slp_instance (loop_vec_info loop_vinf
>        return true;
>      }
>    else
> -    VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
> +    {
> +      VEC_free (stmt_info_for_cost, heap, body_cost_vec);
> +      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
> +    }
>  
>    /* Failed to SLP.  */
>    /* Free the allocated memory.  */
> @@ -1932,26 +1955,27 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb
>    slp_instance instance;
>    int i, j;
>    unsigned int vec_inside_cost = 0, vec_outside_cost = 0, scalar_cost = 0;
> +  unsigned int vec_prologue_cost = 0, vec_epilogue_cost = 0;
>    unsigned int stmt_cost;
>    gimple stmt;
>    gimple_stmt_iterator si;
>    basic_block bb = BB_VINFO_BB (bb_vinfo);
> +  void *target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
>    stmt_vec_info stmt_info = NULL;
> -  tree dummy_type = NULL;
> -  int dummy = 0;
> -  stmt_vector_for_cost stmt_cost_vec;
> +  stmt_vector_for_cost body_cost_vec;
>    stmt_info_for_cost *ci;
>  
>    /* Calculate vector costs.  */
>    FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
>      {
> -      vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
> -      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
> +      body_cost_vec = SLP_INSTANCE_BODY_COST_VEC (instance);
>  
> -      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, ci)
> -	(void) add_stmt_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo),
> -			      ci->count, ci->kind,
> -			      vinfo_for_stmt (ci->stmt), ci->misalign);
> +      FOR_EACH_VEC_ELT (stmt_info_for_cost, body_cost_vec, j, ci)
> +	{
> +	  stmt_info = ci->stmt ? vinfo_for_stmt (ci->stmt) : NULL;
> +	  (void) add_stmt_cost (target_cost_data, ci->count, ci->kind,
> +				stmt_info, ci->misalign, vect_body);
> +	}
>      }
>  
>    /* Calculate scalar cost.  */
> @@ -1967,29 +1991,29 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb
>        if (STMT_VINFO_DATA_REF (stmt_info))
>          {
>            if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
> -            stmt_cost = targetm.vectorize.builtin_vectorization_cost 
> -                          (scalar_load, dummy_type, dummy);
> +            stmt_cost = vect_get_stmt_cost (scalar_load);
>            else
> -            stmt_cost = targetm.vectorize.builtin_vectorization_cost
> -                          (scalar_store, dummy_type, dummy);
> +            stmt_cost = vect_get_stmt_cost (scalar_store);
>          }
>        else
> -        stmt_cost = targetm.vectorize.builtin_vectorization_cost
> -                      (scalar_stmt, dummy_type, dummy);
> +        stmt_cost = vect_get_stmt_cost (scalar_stmt);
>  
>        scalar_cost += stmt_cost;
>      }
>  
>    /* Complete the target-specific cost calculation.  */
> -  vec_inside_cost = finish_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo));
> +  finish_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo), &vec_prologue_cost,
> +	       &vec_inside_cost, &vec_epilogue_cost);
>  
> +  vec_outside_cost = vec_prologue_cost + vec_epilogue_cost;
> +
>    if (vect_print_dump_info (REPORT_COST))
>      {
>        fprintf (vect_dump, "Cost model analysis: \n");
>        fprintf (vect_dump, "  Vector inside of basic block cost: %d\n",
>                 vec_inside_cost);
> -      fprintf (vect_dump, "  Vector outside of basic block cost: %d\n",
> -               vec_outside_cost);
> +      fprintf (vect_dump, "  Vector prologue cost: %d\n", vec_prologue_cost);
> +      fprintf (vect_dump, "  Vector epilogue cost: %d\n", vec_epilogue_cost);
>        fprintf (vect_dump, "  Scalar cost of basic block: %d", scalar_cost);
>      }
>  
> @@ -2200,8 +2224,9 @@ vect_update_slp_costs_according_to_vf (loop_vec_in
>    unsigned int i, j, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>    VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
>    slp_instance instance;
> -  stmt_vector_for_cost stmt_cost_vec;
> +  stmt_vector_for_cost body_cost_vec;
>    stmt_info_for_cost *si;
> +  void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
>  
>    if (vect_print_dump_info (REPORT_SLP))
>      fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ===");
> @@ -2214,12 +2239,12 @@ vect_update_slp_costs_according_to_vf (loop_vec_in
>        /* Record the instance's instructions in the target cost model.
>  	 This was delayed until here because the count of instructions
>  	 isn't known beforehand.  */
> -      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
> +      body_cost_vec = SLP_INSTANCE_BODY_COST_VEC (instance);
>  
> -      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, si)
> -	(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> -			      si->count * ncopies, si->kind,
> -			      vinfo_for_stmt (si->stmt), si->misalign);
> +      FOR_EACH_VEC_ELT (stmt_info_for_cost, body_cost_vec, j, si)
> +	(void) add_stmt_cost (data, si->count * ncopies, si->kind,
> +			      vinfo_for_stmt (si->stmt), si->misalign,
> +			      vect_body);
>      }
>  }
>  
> 
> 
>
William J. Schmidt - July 24, 2012, 12:40 p.m.
On Tue, 2012-07-24 at 10:57 +0200, Richard Guenther wrote:
> On Mon, 23 Jul 2012, William J. Schmidt wrote:
> 
> > This patch completes the conversion of the vectorizer cost model to use
> > target hooks for recording vectorization information and calculating
> > costs.  Previous work handled the costs inside the loop body or basic
> > block being vectorized.  This patch similarly converts the prologue and
> > epilogue costs.
> > 
> > As before, I first verified that the new model provides the same results
> > as the old model on the regression testsuite and on SPEC CPU2006.  I
> > then removed the old model, rather than submitting an intermediate patch
> > with both present.  I have a patch that shows both if it's needed for
> > reference.
> > 
> > Also as before, I found an error in the old cost model wherein prologue
> > costs of phi reduction statements were not being considered during the
> > final vectorization decision.  I have fixed this in the new model; thus,
> > this version of the cost model will be slightly more conservative than
> > the original.  I am currently running SPEC tests to ensure there aren't
> > any resulting degradations.
> > 
> > One thing that could be done in future for further cleanup would be to
> > handle the scalar iteration cost in a similar manner.  Right now this is
> > dealt with by recording N scalar_stmts, where N is the length of the
> > scalar iteration; as with the old model, there is no attempt to
> > differentiate between different scalar statements.  This results in some
> > hackish stuff in, e.g., tree-vect-stmts.c:record_stmt_cost (), where we
> > have to deal with the fact that we may not have a stmt_info for the
> > statement being recorded.  This is only true for these aggregated
> > scalar_stmt costs.
> > 
> > Bootstrapped and tested on powerpc-unknown-linux-gnu with no new
> > regressions.  Assuming the SPEC performance tests come out ok, is this
> > ok for trunk?
> 
> So all costs we query from the backend even for the prologue/epilogue
> are costs for vector stmts (like inits of invariant vectors or
> outer-loop parts in outer loop vectorization)?

Yes, with the exception of copies of scalar iterations introduced by
loop peeling (the N * scalar_stmt business).

There are comments in several places indicating opportunities for
improvement in the modeling, including for the outer-loop case, but for
now your statement holds otherwise.

Thanks,
Bill

> 
> Ok in that case.
> 
> Thanks,
> Richard.
> 
> > Thanks!
> > Bill
> >

Patch

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 189574)
+++ gcc/doc/tm.texi	(working copy)
@@ -5771,15 +5771,15 @@  The default is zero which means to not iterate ove
 @end deftypefn
 
 @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info})
-This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates an unsigned integer for accumulating a single cost.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
+This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
 @end deftypefn
 
-@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (void *@var{data}, int @var{count}, enum vect_cost_for_stmt @var{kind}, struct _stmt_vec_info *@var{stmt_info}, int @var{misalign})
-This hook should update the target-specific @var{data} in response to adding @var{count} copies of the given @var{kind} of statement to the body of a loop or basic block.  The default adds the builtin vectorizer cost for the copies of the statement to the accumulator, and returns the amount added.  The return value should be viewed as a tentative cost that may later be overridden.
+@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (void *@var{data}, int @var{count}, enum vect_cost_for_stmt @var{kind}, struct _stmt_vec_info *@var{stmt_info}, int @var{misalign}, enum vect_cost_model_location @var{where})
+This hook should update the target-specific @var{data} in response to adding @var{count} copies of the given @var{kind} of statement to a loop or basic block.  The default adds the builtin vectorizer cost for the copies of the statement to the accumulator specified by @var{where}, (the prologue, body, or epilogue) and returns the amount added.  The return value should be viewed as a tentative cost that may later be revised.
 @end deftypefn
 
-@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_FINISH_COST (void *@var{data})
-This hook should complete calculations of the cost of vectorizing a loop or basic block based on @var{data}, and return that cost as an unsigned integer.  The default returns the value of the accumulator.
+@deftypefn {Target Hook} void TARGET_VECTORIZE_FINISH_COST (void *@var{data}, unsigned *@var{prologue_cost}, unsigned *@var{body_cost}, unsigned *@var{epilogue_cost})
+This hook should complete calculations of the cost of vectorizing a loop or basic block based on @var{data}, and return the prologue, body, and epilogue costs as unsigned integers.  The default returns the value of the three accumulators.
 @end deftypefn
 
 @deftypefn {Target Hook} void TARGET_VECTORIZE_DESTROY_COST_DATA (void *@var{data})
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 189574)
+++ gcc/targhooks.c	(working copy)
@@ -996,54 +996,58 @@  default_autovectorize_vector_sizes (void)
   return 0;
 }
 
-/* By default, the cost model just accumulates the inside_loop costs for
-   a vectorized loop or block.  So allocate an unsigned int, set it to
-   zero, and return its address.  */
+/* By default, the cost model accumulates three separate costs (prologue,
+   loop body, and epilogue) for a vectorized loop or block.  So allocate an
+   array of three unsigned ints, set it to zero, and return its address.  */
 
 void *
 default_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
 {
-  unsigned *cost = XNEW (unsigned);
-  *cost = 0;
+  unsigned *cost = XNEWVEC (unsigned, 3);
+  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
   return cost;
 }
 
 /* By default, the cost model looks up the cost of the given statement
    kind and mode, multiplies it by the occurrence count, accumulates
-   it into the cost, and returns the cost added.  */
+   it into the cost specified by WHERE, and returns the cost added.  */
 
 unsigned
 default_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
-		       struct _stmt_vec_info *stmt_info, int misalign)
+		       struct _stmt_vec_info *stmt_info, int misalign,
+		       enum vect_cost_model_location where)
 {
   unsigned *cost = (unsigned *) data;
   unsigned retval = 0;
 
   if (flag_vect_cost_model)
     {
-      tree vectype = stmt_vectype (stmt_info);
+      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
       int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
 							  misalign);
       /* Statements in an inner loop relative to the loop being
 	 vectorized are weighted more heavily.  The value here is
 	 arbitrary and could potentially be improved with analysis.  */
-      if (stmt_in_inner_loop_p (stmt_info))
+      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
 	count *= 50;  /* FIXME.  */
 
       retval = (unsigned) (count * stmt_cost);
-      *cost += retval;
+      cost[where] += retval;
     }
 
   return retval;
 }
 
-/* By default, the cost model just returns the accumulated
-   inside_loop cost.  */
+/* By default, the cost model just returns the accumulated costs.  */
 
-unsigned
-default_finish_cost (void *data)
+void
+default_finish_cost (void *data, unsigned *prologue_cost,
+		     unsigned *body_cost, unsigned *epilogue_cost)
 {
-  return *((unsigned *) data);
+  unsigned *cost = (unsigned *) data;
+  *prologue_cost = cost[vect_prologue];
+  *body_cost     = cost[vect_body];
+  *epilogue_cost = cost[vect_epilogue];
 }
 
 /* Free the cost data.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 189574)
+++ gcc/targhooks.h	(working copy)
@@ -92,8 +92,9 @@  extern enum machine_mode default_preferred_simd_mo
 extern unsigned int default_autovectorize_vector_sizes (void);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
-				       struct _stmt_vec_info *, int);
-extern unsigned default_finish_cost (void *);
+				       struct _stmt_vec_info *, int,
+				       enum vect_cost_model_location);
+extern void default_finish_cost (void *, unsigned *, unsigned *, unsigned *);
 extern void default_destroy_cost_data (void *);
 
 /* These are here, and not in hooks.[ch], because not all users of
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 189574)
+++ gcc/target.def	(working copy)
@@ -1054,27 +1054,30 @@  DEFHOOK
 (init_cost,
  "This hook should initialize target-specific data structures in preparation "
  "for modeling the costs of vectorizing a loop or basic block.  The default "
- "allocates an unsigned integer for accumulating a single cost.  "
- "If @var{loop_info} is non-NULL, it identifies the loop being vectorized; "
- "otherwise a single block is being vectorized.",
+ "allocates three unsigned integers for accumulating costs for the prologue, "
+ "body, and epilogue of the loop or basic block.  If @var{loop_info} is "
+ "non-NULL, it identifies the loop being vectorized; otherwise a single block "
+ "is being vectorized.",
  void *,
  (struct loop *loop_info),
  default_init_cost)
 
 /* Target function to record N statements of the given kind using the
-   given vector type within the cost model data for the current loop
-   or block.  */
+   given vector type within the cost model data for the current loop or
+    block.  */
 DEFHOOK
 (add_stmt_cost,
  "This hook should update the target-specific @var{data} in response to "
- "adding @var{count} copies of the given @var{kind} of statement to the "
- "body of a loop or basic block.  The default adds the builtin vectorizer "
- "cost for the copies of the statement to the accumulator, and returns "
- "the amount added.  The return value should be viewed as a tentative "
- "cost that may later be overridden.",
+ "adding @var{count} copies of the given @var{kind} of statement to a "
+ "loop or basic block.  The default adds the builtin vectorizer cost for "
+ "the copies of the statement to the accumulator specified by @var{where}, "
+ "(the prologue, body, or epilogue) and returns the amount added.  The "
+ "return value should be viewed as a tentative cost that may later be "
+ "revised.",
  unsigned,
  (void *data, int count, enum vect_cost_for_stmt kind,
-  struct _stmt_vec_info *stmt_info, int misalign),
+  struct _stmt_vec_info *stmt_info, int misalign,
+  enum vect_cost_model_location where),
  default_add_stmt_cost)
 
 /* Target function to calculate the total cost of the current vectorized
@@ -1082,10 +1085,12 @@  DEFHOOK
 DEFHOOK
 (finish_cost,
  "This hook should complete calculations of the cost of vectorizing a loop "
- "or basic block based on @var{data}, and return that cost as an unsigned "
- "integer.  The default returns the value of the accumulator.",
- unsigned,
- (void *data),
+ "or basic block based on @var{data}, and return the prologue, body, and "
+ "epilogue costs as unsigned integers.  The default returns the value of "
+ "the three accumulators.",
+ void,
+ (void *data, unsigned *prologue_cost, unsigned *body_cost,
+  unsigned *epilogue_cost),
  default_finish_cost)
 
 /* Function to delete target-specific cost modeling data.  */
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 189574)
+++ gcc/target.h	(working copy)
@@ -157,6 +157,14 @@  enum vect_cost_for_stmt
   vec_construct
 };
 
+/* Separate locations for which the vectorizer cost model should
+   track costs.  */
+enum vect_cost_model_location {
+  vect_prologue = 0,
+  vect_body = 1,
+  vect_epilogue = 2
+};
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	(revision 189574)
+++ gcc/tree-vectorizer.h	(working copy)
@@ -118,11 +118,6 @@  typedef struct _slp_tree {
      scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
      divided by vector size.  */
   unsigned int vec_stmts_size;
-  /* Vectorization costs associated with SLP node.  */
-  struct
-  {
-    int outside_of_loop;     /* Statements generated outside loop.  */
-  } cost;
 } *slp_tree;
 
 DEF_VEC_P(slp_tree);
@@ -141,14 +136,8 @@  typedef struct _slp_instance {
   unsigned int unrolling_factor;
 
   /* Vectorization costs associated with SLP instance.  */
-  struct
-  {
-    int outside_of_loop;     /* Statements generated outside loop.  */
-  } cost;
+  stmt_vector_for_cost body_cost_vec;
 
-  /* Inside-loop costs.  */
-  stmt_vector_for_cost stmt_cost_vec;
-
   /* Loads permutation relatively to the stores, NULL if there is no
      permutation.  */
   VEC (int, heap) *load_permutation;
@@ -168,8 +157,7 @@  DEF_VEC_ALLOC_P(slp_instance, heap);
 #define SLP_INSTANCE_TREE(S)                     (S)->root
 #define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
 #define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
-#define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S)     (S)->cost.outside_of_loop
-#define SLP_INSTANCE_STMT_COST_VEC(S)            (S)->stmt_cost_vec
+#define SLP_INSTANCE_BODY_COST_VEC(S)            (S)->body_cost_vec
 #define SLP_INSTANCE_LOAD_PERMUTATION(S)         (S)->load_permutation
 #define SLP_INSTANCE_LOADS(S)                    (S)->loads
 #define SLP_INSTANCE_FIRST_LOAD_STMT(S)          (S)->first_load
@@ -178,7 +166,6 @@  DEF_VEC_ALLOC_P(slp_instance, heap);
 #define SLP_TREE_SCALAR_STMTS(S)                 (S)->stmts
 #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
-#define SLP_TREE_OUTSIDE_OF_LOOP_COST(S)         (S)->cost.outside_of_loop
 
 /* This structure is used in creation of an SLP tree.  Each instance
    corresponds to the same operand in a group of scalar stmts in an SLP
@@ -212,7 +199,7 @@  typedef struct _vect_peel_extended_info
   struct _vect_peel_info peel_info;
   unsigned int inside_cost;
   unsigned int outside_cost;
-  stmt_vector_for_cost stmt_cost_vec;
+  stmt_vector_for_cost body_cost_vec;
 } *vect_peel_extended_info;
 
 /*-----------------------------------------------------------------*/
@@ -566,12 +553,6 @@  typedef struct _stmt_vec_info {
      indicates whether the stmt needs to be vectorized.  */
   enum vect_relevant relevant;
 
-  /* Vectorization costs associated with statement.  */
-  struct
-  {
-    int outside_of_loop;     /* Statements generated outside loop.  */
-  } cost;
-
   /* The bb_vec_info with respect to which STMT is vectorized.  */
   bb_vec_info bb_vinfo;
 
@@ -628,7 +609,6 @@  typedef struct _stmt_vec_info {
 #define GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
 
 #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_scope)
-#define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
 
 #define HYBRID_SLP_STMT(S)                ((S)->slp_type == hybrid)
 #define PURE_SLP_STMT(S)                  ((S)->slp_type == pure_slp)
@@ -767,18 +747,6 @@  is_loop_header_bb_p (basic_block bb)
   return false;
 }
 
-/* Set outside loop vectorization cost.  */
-
-static inline void
-stmt_vinfo_set_outside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node,
-				     int cost)
-{
-  if (slp_node)
-    SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node) = cost;
-  else
-    STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
-}
-
 /* Return pow2 (X).  */
 
 static inline int
@@ -792,16 +760,22 @@  vect_pow2 (int x)
   return res;
 }
 
+/* Alias targetm.vectorize.builtin_vectorization_cost.  */
+
+static inline int
+builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
+			    tree vectype, int misalign)
+{
+  return targetm.vectorize.builtin_vectorization_cost (type_of_cost,
+						       vectype, misalign);
+}
+
 /* Get cost by calling cost target builtin.  */
 
 static inline
 int vect_get_stmt_cost (enum vect_cost_for_stmt type_of_cost)
 {
-  tree dummy_type = NULL;
-  int dummy = 0;
-
-  return targetm.vectorize.builtin_vectorization_cost (type_of_cost,
-                                                       dummy_type, dummy);
+  return builtin_vectorization_cost (type_of_cost, NULL, 0);
 }
 
 /* Alias targetm.vectorize.init_cost.  */
@@ -816,18 +790,20 @@  init_cost (struct loop *loop_info)
 
 static inline unsigned
 add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
-	       stmt_vec_info stmt_info, int misalign)
+	       stmt_vec_info stmt_info, int misalign,
+	       enum vect_cost_model_location where)
 {
   return targetm.vectorize.add_stmt_cost (data, count, kind,
-					  stmt_info, misalign);
+					  stmt_info, misalign, where);
 }
 
 /* Alias targetm.vectorize.finish_cost.  */
 
-static inline unsigned
-finish_cost (void *data)
+static inline void
+finish_cost (void *data, unsigned *prologue_cost,
+	     unsigned *body_cost, unsigned *epilogue_cost)
 {
-  return targetm.vectorize.finish_cost (data);
+  targetm.vectorize.finish_cost (data, prologue_cost, body_cost, epilogue_cost);
 }
 
 /* Alias targetm.vectorize.destroy_cost_data.  */
@@ -905,14 +881,18 @@  extern stmt_vec_info new_stmt_vec_info (gimple stm
 extern void free_stmt_vec_info (gimple stmt);
 extern tree vectorizable_function (gimple, tree, tree);
 extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
-                                    slp_tree, stmt_vector_for_cost *);
+                                    stmt_vector_for_cost *,
+				    stmt_vector_for_cost *);
 extern void vect_model_store_cost (stmt_vec_info, int, bool,
 				   enum vect_def_type, slp_tree,
+				   stmt_vector_for_cost *,
 				   stmt_vector_for_cost *);
 extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
+				  stmt_vector_for_cost *,
 				  stmt_vector_for_cost *);
 extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
-				  enum vect_cost_for_stmt, stmt_vec_info, int);
+				  enum vect_cost_for_stmt, stmt_vec_info,
+				  int, enum vect_cost_model_location);
 extern void vect_finish_stmt_generation (gimple, gimple,
                                          gimple_stmt_iterator *);
 extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
@@ -928,7 +908,8 @@  extern bool vectorizable_condition (gimple, gimple
                                     tree, int, slp_tree);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
 				unsigned int *, unsigned int *,
-				stmt_vector_for_cost *);
+				stmt_vector_for_cost *,
+				stmt_vector_for_cost *, bool);
 extern void vect_get_store_cost (struct data_reference *, int,
 				 unsigned int *, stmt_vector_for_cost *);
 extern bool vect_supportable_shift (enum tree_code, tree);
@@ -992,7 +973,9 @@  extern bool vectorizable_induction (gimple, gimple
 extern int vect_estimate_min_profitable_iters (loop_vec_info);
 extern tree get_initial_def_for_reduction (gimple, tree, tree *);
 extern int vect_min_worthwhile_factor (enum tree_code);
-extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, int);
+extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, int,
+					stmt_vector_for_cost *,
+					stmt_vector_for_cost *);
 extern int vect_get_single_scalar_iteration_cost (loop_vec_info);
 
 /* In tree-vect-slp.c.  */
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 189574)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -2440,9 +2440,11 @@  vect_get_single_scalar_iteration_cost (loop_vec_in
 int
 vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
                              int *peel_iters_epilogue,
-                             int scalar_single_iter_cost)
+                             int scalar_single_iter_cost,
+			     stmt_vector_for_cost *prologue_cost_vec,
+			     stmt_vector_for_cost *epilogue_cost_vec)
 {
-  int peel_guard_costs = 0;
+  int retval = 0;
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
 
   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
@@ -2455,7 +2457,8 @@  vect_get_known_peeling_cost (loop_vec_info loop_vi
 
       /* If peeled iterations are known but number of scalar loop
          iterations are unknown, count a taken branch per peeled loop.  */
-      peel_guard_costs =  2 * vect_get_stmt_cost (cond_branch_taken);
+      retval = record_stmt_cost (prologue_cost_vec, 2, cond_branch_taken,
+				 NULL, 0, vect_prologue);
     }
   else
     {
@@ -2469,9 +2472,15 @@  vect_get_known_peeling_cost (loop_vec_info loop_vi
         *peel_iters_epilogue = vf;
     }
 
-   return (peel_iters_prologue * scalar_single_iter_cost)
-            + (*peel_iters_epilogue * scalar_single_iter_cost)
-           + peel_guard_costs;
+  if (peel_iters_prologue)
+    retval += record_stmt_cost (prologue_cost_vec,
+				peel_iters_prologue * scalar_single_iter_cost,
+				scalar_stmt, NULL, 0, vect_prologue);
+  if (*peel_iters_epilogue)
+    retval += record_stmt_cost (epilogue_cost_vec,
+				*peel_iters_epilogue * scalar_single_iter_cost,
+				scalar_stmt, NULL, 0, vect_epilogue);
+  return retval;
 }
 
 /* Function vect_estimate_min_profitable_iters
@@ -2486,22 +2495,18 @@  vect_get_known_peeling_cost (loop_vec_info loop_vi
 int
 vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo)
 {
-  int i;
   int min_profitable_iters;
   int peel_iters_prologue;
   int peel_iters_epilogue;
-  int vec_inside_cost = 0;
+  unsigned vec_inside_cost = 0;
   int vec_outside_cost = 0;
+  unsigned vec_prologue_cost = 0;
+  unsigned vec_epilogue_cost = 0;
   int scalar_single_iter_cost = 0;
   int scalar_outside_cost = 0;
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes;
   int npeel = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
-  int peel_guard_costs = 0;
-  VEC (slp_instance, heap) *slp_instances;
-  slp_instance instance;
+  void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
   /* Cost model disabled.  */
   if (!flag_vect_cost_model)
@@ -2515,8 +2520,10 @@  vect_estimate_min_profitable_iters (loop_vec_info
   if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo))
     {
       /*  FIXME: Make cost depend on complexity of individual check.  */
-      vec_outside_cost +=
-	VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
+      unsigned len = VEC_length (gimple,
+				 LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
+      (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0,
+			    vect_prologue);
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "cost model: Adding cost of checks for loop "
                  "versioning to treat misalignment.\n");
@@ -2526,8 +2533,9 @@  vect_estimate_min_profitable_iters (loop_vec_info
   if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
     {
       /*  FIXME: Make cost depend on complexity of individual check.  */
-      vec_outside_cost +=
-        VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
+      unsigned len = VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
+      (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0,
+			    vect_prologue);
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "cost model: Adding cost of checks for loop "
                  "versioning aliasing.\n");
@@ -2535,7 +2543,8 @@  vect_estimate_min_profitable_iters (loop_vec_info
 
   if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo)
       || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
-    vec_outside_cost += vect_get_stmt_cost (cond_branch_taken); 
+    (void) add_stmt_cost (target_cost_data, 1, cond_branch_taken, NULL, 0,
+			  vect_prologue);
 
   /* Count statements in scalar loop.  Using this as scalar cost for a single
      iteration for now.
@@ -2545,52 +2554,6 @@  vect_estimate_min_profitable_iters (loop_vec_info
      TODO: Consider assigning different costs to different scalar
      statements.  */
 
-  for (i = 0; i < nbbs; i++)
-    {
-      gimple_stmt_iterator si;
-      basic_block bb = bbs[i];
-
-      for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
-	{
-	  gimple stmt = gsi_stmt (si);
-	  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-
-	  if (STMT_VINFO_IN_PATTERN_P (stmt_info))
-	    {
-	      stmt = STMT_VINFO_RELATED_STMT (stmt_info);
-	      stmt_info = vinfo_for_stmt (stmt);
-	    }
-
-	  /* Skip stmts that are not vectorized inside the loop.  */
-	  if (!STMT_VINFO_RELEVANT_P (stmt_info)
-	      && (!STMT_VINFO_LIVE_P (stmt_info)
-                 || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))))
-	    continue;
-
-	  /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
-	     some of the "outside" costs are generated inside the outer-loop.  */
-	  vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
-          if (is_pattern_stmt_p (stmt_info)
-	      && STMT_VINFO_PATTERN_DEF_SEQ (stmt_info))
-            {
-	      gimple_stmt_iterator gsi;
-	      
-	      for (gsi = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-		   !gsi_end_p (gsi); gsi_next (&gsi))
-                {
-                  gimple pattern_def_stmt = gsi_stmt (gsi);
-                  stmt_vec_info pattern_def_stmt_info
-		    = vinfo_for_stmt (pattern_def_stmt);
-                  if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)
-                      || STMT_VINFO_LIVE_P (pattern_def_stmt_info))
-		    vec_outside_cost
-		      += STMT_VINFO_OUTSIDE_OF_LOOP_COST
-		        (pattern_def_stmt_info);
-		}
-	    }
-	}
-    }
-
   scalar_single_iter_cost = vect_get_single_scalar_iteration_cost (loop_vinfo);
 
   /* Add additional cost for the peeled instructions in prologue and epilogue
@@ -2621,18 +2584,54 @@  vect_estimate_min_profitable_iters (loop_vec_info
          branch per peeled loop. Even if scalar loop iterations are known,
          vector iterations are not known since peeled prologue iterations are
          not known. Hence guards remain the same.  */
-      peel_guard_costs +=  2 * (vect_get_stmt_cost (cond_branch_taken)
-                                + vect_get_stmt_cost (cond_branch_not_taken));
-      vec_outside_cost += (peel_iters_prologue * scalar_single_iter_cost)
-                           + (peel_iters_epilogue * scalar_single_iter_cost)
-                           + peel_guard_costs;
+      (void) add_stmt_cost (target_cost_data, 2, cond_branch_taken,
+			    NULL, 0, vect_prologue);
+      (void) add_stmt_cost (target_cost_data, 2, cond_branch_not_taken,
+			    NULL, 0, vect_prologue);
+      /* FORNOW: Don't attempt to pass individual scalar instructions to
+	 the model; just assume linear cost for scalar iterations.  */
+      (void) add_stmt_cost (target_cost_data,
+			    peel_iters_prologue * scalar_single_iter_cost,
+			    scalar_stmt, NULL, 0, vect_prologue);
+      (void) add_stmt_cost (target_cost_data, 
+			    peel_iters_epilogue * scalar_single_iter_cost,
+			    scalar_stmt, NULL, 0, vect_epilogue);
     }
   else
     {
+      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+      stmt_info_for_cost *si;
+      int j;
+      void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
+
+      prologue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
+      epilogue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
       peel_iters_prologue = npeel;
-      vec_outside_cost += vect_get_known_peeling_cost (loop_vinfo,
-                                    peel_iters_prologue, &peel_iters_epilogue,
-                                    scalar_single_iter_cost);
+
+      (void) vect_get_known_peeling_cost (loop_vinfo, peel_iters_prologue,
+					  &peel_iters_epilogue,
+					  scalar_single_iter_cost,
+					  &prologue_cost_vec,
+					  &epilogue_cost_vec);
+
+      FOR_EACH_VEC_ELT (stmt_info_for_cost, prologue_cost_vec, j, si)
+	{
+	  struct _stmt_vec_info *stmt_info
+	    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
+	  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
+				si->misalign, vect_prologue);
+	}
+
+      FOR_EACH_VEC_ELT (stmt_info_for_cost, epilogue_cost_vec, j, si)
+	{
+	  struct _stmt_vec_info *stmt_info
+	    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
+	  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
+				si->misalign, vect_epilogue);
+	}
+
+      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
+      VEC_free (stmt_info_for_cost, heap, epilogue_cost_vec);
     }
 
   /* FORNOW: The scalar outside cost is incremented in one of the
@@ -2708,14 +2707,11 @@  vect_estimate_min_profitable_iters (loop_vec_info
 	}
     }
 
-  /* Add SLP costs.  */
-  slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
-  FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
-    vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
+  /* Complete the target-specific cost calculations.  */
+  finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), &vec_prologue_cost,
+	       &vec_inside_cost, &vec_epilogue_cost);
 
-  /* Complete the target-specific cost calculation for the inside-of-loop
-     costs.  */
-  vec_inside_cost = finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
+  vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
   
   /* Calculate number of iterations required to make the vector version
      profitable, relative to the loop bodies only.  The following condition
@@ -2727,7 +2723,7 @@  vect_estimate_min_profitable_iters (loop_vec_info
      PL_ITERS = prologue iterations, EP_ITERS= epilogue iterations
      SOC = scalar outside cost for run time cost model check.  */
 
-  if ((scalar_single_iter_cost * vf) > vec_inside_cost)
+  if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
     {
       if (vec_outside_cost <= 0)
         min_profitable_iters = 1;
@@ -2740,8 +2736,8 @@  vect_estimate_min_profitable_iters (loop_vec_info
                                     - vec_inside_cost);
 
           if ((scalar_single_iter_cost * vf * min_profitable_iters)
-              <= ((vec_inside_cost * min_profitable_iters)
-                  + ((vec_outside_cost - scalar_outside_cost) * vf)))
+              <= (((int) vec_inside_cost * min_profitable_iters)
+                  + (((int) vec_outside_cost - scalar_outside_cost) * vf)))
             min_profitable_iters++;
         }
     }
@@ -2761,8 +2757,10 @@  vect_estimate_min_profitable_iters (loop_vec_info
       fprintf (vect_dump, "Cost model analysis: \n");
       fprintf (vect_dump, "  Vector inside of loop cost: %d\n",
 	       vec_inside_cost);
-      fprintf (vect_dump, "  Vector outside of loop cost: %d\n",
-	       vec_outside_cost);
+      fprintf (vect_dump, "  Vector prologue cost: %d\n",
+	       vec_prologue_cost);
+      fprintf (vect_dump, "  Vector epilogue cost: %d\n",
+	       vec_epilogue_cost);
       fprintf (vect_dump, "  Scalar iteration cost: %d\n",
 	       scalar_single_iter_cost);
       fprintf (vect_dump, "  Scalar outside cost: %d\n", scalar_outside_cost);
@@ -2803,7 +2801,7 @@  static bool
 vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
 			   int ncopies)
 {
-  int outer_cost = 0;
+  int prologue_cost = 0, epilogue_cost = 0;
   enum tree_code code;
   optab optab;
   tree vectype;
@@ -2812,12 +2810,11 @@  vect_model_reduction_cost (stmt_vec_info stmt_info
   enum machine_mode mode;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
   /* Cost of reduction op inside loop.  */
-  unsigned inside_cost
-    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
-		     ncopies, vector_stmt, stmt_info, 0);
-
+  unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt,
+					stmt_info, 0, vect_body);
   stmt = STMT_VINFO_STMT (stmt_info);
 
   switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
@@ -2859,7 +2856,8 @@  vect_model_reduction_cost (stmt_vec_info stmt_info
   code = gimple_assign_rhs_code (orig_stmt);
 
   /* Add in cost for initial definition.  */
-  outer_cost += vect_get_stmt_cost (scalar_to_vec);
+  prologue_cost += add_stmt_cost (target_cost_data, 1, scalar_to_vec,
+				  stmt_info, 0, vect_prologue);
 
   /* Determine cost of epilogue code.
 
@@ -2869,8 +2867,12 @@  vect_model_reduction_cost (stmt_vec_info stmt_info
   if (!nested_in_vect_loop_p (loop, orig_stmt))
     {
       if (reduc_code != ERROR_MARK)
-	outer_cost += vect_get_stmt_cost (vector_stmt) 
-                      + vect_get_stmt_cost (vec_to_scalar); 
+	{
+	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
+					  stmt_info, 0, vect_epilogue);
+	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vec_to_scalar,
+					  stmt_info, 0, vect_epilogue);
+	}
       else
 	{
 	  int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
@@ -2885,25 +2887,31 @@  vect_model_reduction_cost (stmt_vec_info stmt_info
 	  if (VECTOR_MODE_P (mode)
 	      && optab_handler (optab, mode) != CODE_FOR_nothing
 	      && optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
-	    /* Final reduction via vector shifts and the reduction operator. Also
-	       requires scalar extract.  */
-	    outer_cost += ((exact_log2(nelements) * 2) 
-              * vect_get_stmt_cost (vector_stmt) 
-  	      + vect_get_stmt_cost (vec_to_scalar));
+	    {
+	      /* Final reduction via vector shifts and the reduction operator.
+		 Also requires scalar extract.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data,
+					      exact_log2 (nelements) * 2,
+					      vector_stmt, stmt_info, 0,
+					      vect_epilogue);
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
+					      vec_to_scalar, stmt_info, 0,
+					      vect_epilogue);
+	    }	  
 	  else
-	    /* Use extracts and reduction op for final reduction.  For N elements,
-               we have N extracts and N-1 reduction ops.  */
-	    outer_cost += ((nelements + nelements - 1) 
-              * vect_get_stmt_cost (vector_stmt));
+	    /* Use extracts and reduction op for final reduction.  For N
+	       elements, we have N extracts and N-1 reduction ops.  */
+	    epilogue_cost += add_stmt_cost (target_cost_data, 
+					    nelements + nelements - 1,
+					    vector_stmt, stmt_info, 0,
+					    vect_epilogue);
 	}
     }
 
-  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
-
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_reduction_cost: inside_cost = %d, "
-             "outside_cost = %d .", inside_cost,
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "prologue_cost = %d, epilogue_cost = %d .", inside_cost,
+	     prologue_cost, epilogue_cost);
 
   return true;
 }
@@ -2917,20 +2925,20 @@  static void
 vect_model_induction_cost (stmt_vec_info stmt_info, int ncopies)
 {
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
+  unsigned inside_cost, prologue_cost;
 
   /* loop cost for vec_loop.  */
-  unsigned inside_cost
-    = add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), ncopies,
-		     vector_stmt, stmt_info, 0);
+  inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt,
+			       stmt_info, 0, vect_body);
 
   /* prologue cost for vec_init and vec_step.  */
-  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)  
-    = 2 * vect_get_stmt_cost (scalar_to_vec);
+  prologue_cost = add_stmt_cost (target_cost_data, 2, scalar_to_vec,
+				 stmt_info, 0, vect_prologue);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_induction_cost: inside_cost = %d, "
-             "outside_cost = %d .", inside_cost,
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "prologue_cost = %d .", inside_cost, prologue_cost);
 }
 
 
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	(revision 189574)
+++ gcc/tree-vect-data-refs.c	(working copy)
@@ -1204,10 +1204,11 @@  vector_alignment_reachable_p (struct data_referenc
 
 /* Calculate the cost of the memory access represented by DR.  */
 
-static stmt_vector_for_cost
+static void
 vect_get_data_access_cost (struct data_reference *dr,
                            unsigned int *inside_cost,
-                           unsigned int *outside_cost)
+                           unsigned int *outside_cost,
+			   stmt_vector_for_cost *body_cost_vec)
 {
   gimple stmt = DR_STMT (dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
@@ -1215,19 +1216,16 @@  vect_get_data_access_cost (struct data_reference *
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   int ncopies = vf / nunits;
-  stmt_vector_for_cost stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
 
   if (DR_IS_READ (dr))
-    vect_get_load_cost (dr, ncopies, true, inside_cost,
-			outside_cost, &stmt_cost_vec);
+    vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,
+			NULL, body_cost_vec, false);
   else
-    vect_get_store_cost (dr, ncopies, inside_cost, &stmt_cost_vec);
+    vect_get_store_cost (dr, ncopies, inside_cost, body_cost_vec);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
              "outside_cost = %d.", *inside_cost, *outside_cost);
-
-  return stmt_cost_vec;
 }
 
 
@@ -1320,8 +1318,13 @@  vect_peeling_hash_get_lowest_cost (void **slot, vo
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   VEC (data_reference_p, heap) *datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   struct data_reference *dr;
-  stmt_vector_for_cost stmt_cost_vec = NULL;
+  stmt_vector_for_cost prologue_cost_vec, body_cost_vec, epilogue_cost_vec;
+  int single_iter_cost;
 
+  prologue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
+  body_cost_vec     = VEC_alloc (stmt_info_for_cost, heap, 2);
+  epilogue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 2);
+
   FOR_EACH_VEC_ELT (data_reference_p, datarefs, i, dr)
     {
       stmt = DR_STMT (dr);
@@ -1334,23 +1337,35 @@  vect_peeling_hash_get_lowest_cost (void **slot, vo
 
       save_misalignment = DR_MISALIGNMENT (dr);
       vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
-      stmt_cost_vec = vect_get_data_access_cost (dr, &inside_cost,
-						 &outside_cost);
+      vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
+				 &body_cost_vec);
       SET_DR_MISALIGNMENT (dr, save_misalignment);
     }
 
-  outside_cost += vect_get_known_peeling_cost (loop_vinfo, elem->npeel, &dummy,
-                         vect_get_single_scalar_iteration_cost (loop_vinfo));
+  single_iter_cost = vect_get_single_scalar_iteration_cost (loop_vinfo);
+  outside_cost += vect_get_known_peeling_cost (loop_vinfo, elem->npeel,
+					       &dummy, single_iter_cost,
+					       &prologue_cost_vec,
+					       &epilogue_cost_vec);
 
+  /* Prologue and epilogue costs are added to the target model later.
+     These costs depend only on the scalar iteration cost, the
+     number of peeling iterations finally chosen, and the number of
+     misaligned statements.  So discard the information found here.  */
+  VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
+  VEC_free (stmt_info_for_cost, heap, epilogue_cost_vec);
+
   if (inside_cost < min->inside_cost
       || (inside_cost == min->inside_cost && outside_cost < min->outside_cost))
     {
       min->inside_cost = inside_cost;
       min->outside_cost = outside_cost;
-      min->stmt_cost_vec = stmt_cost_vec;
+      min->body_cost_vec = body_cost_vec;
       min->peel_info.dr = elem->dr;
       min->peel_info.npeel = elem->npeel;
     }
+  else
+    VEC_free (stmt_info_for_cost, heap, body_cost_vec);
 
   return 1;
 }
@@ -1363,12 +1378,12 @@  vect_peeling_hash_get_lowest_cost (void **slot, vo
 static struct data_reference *
 vect_peeling_hash_choose_best_peeling (loop_vec_info loop_vinfo,
                                        unsigned int *npeel,
-				       stmt_vector_for_cost *stmt_cost_vec)
+				       stmt_vector_for_cost *body_cost_vec)
 {
    struct _vect_peel_extended_info res;
 
    res.peel_info.dr = NULL;
-   res.stmt_cost_vec = NULL;
+   res.body_cost_vec = NULL;
 
    if (flag_vect_cost_model)
      {
@@ -1385,7 +1400,7 @@  vect_peeling_hash_choose_best_peeling (loop_vec_in
      }
 
    *npeel = res.peel_info.npeel;
-   *stmt_cost_vec = res.stmt_cost_vec;
+   *body_cost_vec = res.body_cost_vec;
    return res.peel_info.dr;
 }
 
@@ -1502,7 +1517,7 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
   unsigned possible_npeel_number = 1;
   tree vectype;
   unsigned int nelements, mis, same_align_drs_max = 0;
-  stmt_vector_for_cost stmt_cost_vec = NULL;
+  stmt_vector_for_cost body_cost_vec = NULL;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
@@ -1706,12 +1721,15 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
           unsigned int store_inside_cost = 0, store_outside_cost = 0;
           unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
           unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
+	  stmt_vector_for_cost dummy = VEC_alloc (stmt_info_for_cost, heap, 2);
 
-          (void) vect_get_data_access_cost (dr0, &load_inside_cost,
-					    &load_outside_cost);
-          (void) vect_get_data_access_cost (first_store, &store_inside_cost,
-					    &store_outside_cost);
+          vect_get_data_access_cost (dr0, &load_inside_cost, &load_outside_cost,
+				     &dummy);
+          vect_get_data_access_cost (first_store, &store_inside_cost,
+				     &store_outside_cost, &dummy);
 
+	  VEC_free (stmt_info_for_cost, heap, dummy);
+
           /* Calculate the penalty for leaving FIRST_STORE unaligned (by
              aligning the load DR0).  */
           load_inside_penalty = store_inside_cost;
@@ -1775,7 +1793,7 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
 
       /* Choose the best peeling from the hash table.  */
       dr0 = vect_peeling_hash_choose_best_peeling (loop_vinfo, &npeel,
-						   &stmt_cost_vec);
+						   &body_cost_vec);
       if (!dr0 || !npeel)
         do_peeling = false;
     }
@@ -1860,6 +1878,7 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
       if (do_peeling)
         {
 	  stmt_info_for_cost *si;
+	  void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
           /* (1.2) Update the DR_MISALIGNMENT of each data reference DR_i.
              If the misalignment of DR_i is identical to that of dr0 then set
@@ -1887,13 +1906,16 @@  vect_enhance_data_refs_alignment (loop_vec_info lo
 	  /* We've delayed passing the inside-loop peeling costs to the
 	     target cost model until we were sure peeling would happen.
 	     Do so now.  */
-	  if (stmt_cost_vec)
+	  if (body_cost_vec)
 	    {
-	      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, i, si)
-		(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
-				      si->count, si->kind,
-				      vinfo_for_stmt (si->stmt), si->misalign);
-	      VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
+	      FOR_EACH_VEC_ELT (stmt_info_for_cost, body_cost_vec, i, si)
+		{
+		  struct _stmt_vec_info *stmt_info
+		    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
+		  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
+					si->misalign, vect_body);
+		}
+	      VEC_free (stmt_info_for_cost, heap, body_cost_vec);
 	    }
 
 	  stat = vect_verify_datarefs_alignment (loop_vinfo, NULL);
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	(revision 189574)
+++ gcc/tree-vect-stmts.c	(working copy)
@@ -72,18 +72,18 @@  stmt_in_inner_loop_p (struct _stmt_vec_info *stmt_
    Return a preliminary estimate of the statement's cost.  */
 
 unsigned
-record_stmt_cost (stmt_vector_for_cost *stmt_cost_vec, int count,
+record_stmt_cost (stmt_vector_for_cost *body_cost_vec, int count,
 		  enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
-		  int misalign)
+		  int misalign, enum vect_cost_model_location where)
 {
-  if (stmt_cost_vec)
+  if (body_cost_vec)
     {
-      tree vectype = stmt_vectype (stmt_info);
-      add_stmt_info_to_vec (stmt_cost_vec, count, kind,
-			    STMT_VINFO_STMT (stmt_info), misalign);
+      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
+      add_stmt_info_to_vec (body_cost_vec, count, kind,
+			    stmt_info ? STMT_VINFO_STMT (stmt_info) : NULL,
+			    misalign);
       return (unsigned)
-	(targetm.vectorize.builtin_vectorization_cost (kind, vectype, misalign)
-	 * count);
+	(builtin_vectorization_cost (kind, vectype, misalign) * count);
 	 
     }
   else
@@ -97,7 +97,8 @@  unsigned
       else
 	target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
 
-      return add_stmt_cost (target_cost_data, count, kind, stmt_info, misalign);
+      return add_stmt_cost (target_cost_data, count, kind, stmt_info,
+			    misalign, where);
     }
 }
 
@@ -795,11 +796,12 @@  vect_mark_stmts_to_be_vectorized (loop_vec_info lo
 
 void
 vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies,
-			enum vect_def_type *dt, slp_tree slp_node,
-			stmt_vector_for_cost *stmt_cost_vec)
+			enum vect_def_type *dt,
+			stmt_vector_for_cost *prologue_cost_vec,
+			stmt_vector_for_cost *body_cost_vec)
 {
   int i;
-  int inside_cost = 0, outside_cost = 0;
+  int inside_cost = 0, prologue_cost = 0;
 
   /* The SLP costs were already calculated during SLP tree build.  */
   if (PURE_SLP_STMT (stmt_info))
@@ -807,21 +809,17 @@  vect_model_simple_cost (stmt_vec_info stmt_info, i
 
   /* FORNOW: Assuming maximum 2 args per stmts.  */
   for (i = 0; i < 2; i++)
-    {
-      if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
-	outside_cost += vect_get_stmt_cost (vector_stmt); 
-    }
+    if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
+      prologue_cost += record_stmt_cost (prologue_cost_vec, 1, vector_stmt,
+					 stmt_info, 0, vect_prologue);
 
-  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
-  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
-
   /* Pass the inside-of-loop statements to the target-specific cost model.  */
-  inside_cost = record_stmt_cost (stmt_cost_vec, ncopies, vector_stmt,
-				  stmt_info, 0);
+  inside_cost = record_stmt_cost (body_cost_vec, ncopies, vector_stmt,
+				  stmt_info, 0, vect_body);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
-             "outside_cost = %d .", inside_cost, outside_cost);
+             "prologue_cost = %d .", inside_cost, prologue_cost);
 }
 
 
@@ -835,7 +833,7 @@  vect_model_promotion_demotion_cost (stmt_vec_info
 				    enum vect_def_type *dt, int pwr)
 {
   int i, tmp;
-  int inside_cost = 0, outside_cost = 0;
+  int inside_cost = 0, prologue_cost = 0;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   void *target_cost_data;
@@ -854,22 +852,19 @@  vect_model_promotion_demotion_cost (stmt_vec_info
       tmp = (STMT_VINFO_TYPE (stmt_info) == type_promotion_vec_info_type) ?
 	(i + 1) : i;
       inside_cost += add_stmt_cost (target_cost_data, vect_pow2 (tmp),
-				    vec_promote_demote, stmt_info, 0);
+				    vec_promote_demote, stmt_info, 0,
+				    vect_body);
     }
 
   /* FORNOW: Assuming maximum 2 args per stmts.  */
   for (i = 0; i < 2; i++)
-    {
-      if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
-        outside_cost += vect_get_stmt_cost (vector_stmt);
-    }
+    if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
+      prologue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
+				      stmt_info, 0, vect_prologue);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_promotion_demotion_cost: inside_cost = %d, "
-             "outside_cost = %d .", inside_cost, outside_cost);
-
-  /* Set the costs in STMT_INFO.  */
-  stmt_vinfo_set_outside_of_loop_cost (stmt_info, NULL, outside_cost);
+             "prologue_cost = %d .", inside_cost, prologue_cost);
 }
 
 /* Function vect_cost_group_size
@@ -898,10 +893,12 @@  vect_cost_group_size (stmt_vec_info stmt_info)
 void
 vect_model_store_cost (stmt_vec_info stmt_info, int ncopies,
 		       bool store_lanes_p, enum vect_def_type dt,
-		       slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
+		       slp_tree slp_node,
+		       stmt_vector_for_cost *prologue_cost_vec,
+		       stmt_vector_for_cost *body_cost_vec)
 {
   int group_size;
-  unsigned int inside_cost = 0, outside_cost = 0;
+  unsigned int inside_cost = 0, prologue_cost = 0;
   struct data_reference *first_dr;
   gimple first_stmt;
 
@@ -910,7 +907,8 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
     return;
 
   if (dt == vect_constant_def || dt == vect_external_def)
-    outside_cost = vect_get_stmt_cost (scalar_to_vec); 
+    prologue_cost += record_stmt_cost (prologue_cost_vec, 1, scalar_to_vec,
+				       stmt_info, 0, vect_prologue);
 
   /* Grouped access?  */
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
@@ -944,8 +942,8 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
       /* Uses a high and low interleave operation for each needed permute.  */
       
       int nstmts = ncopies * exact_log2 (group_size) * group_size;
-      inside_cost = record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
-				      stmt_info, 0);
+      inside_cost = record_stmt_cost (body_cost_vec, nstmts, vec_perm,
+				      stmt_info, 0, vect_body);
 
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
@@ -953,14 +951,11 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
     }
 
   /* Costs of the stores.  */
-  vect_get_store_cost (first_dr, ncopies, &inside_cost, stmt_cost_vec);
+  vect_get_store_cost (first_dr, ncopies, &inside_cost, body_cost_vec);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
-             "outside_cost = %d .", inside_cost, outside_cost);
-
-  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
-  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
+             "prologue_cost = %d .", inside_cost, prologue_cost);
 }
 
 
@@ -968,7 +963,7 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
 void
 vect_get_store_cost (struct data_reference *dr, int ncopies,
 		     unsigned int *inside_cost,
-		     stmt_vector_for_cost *stmt_cost_vec)
+		     stmt_vector_for_cost *body_cost_vec)
 {
   int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
   gimple stmt = DR_STMT (dr);
@@ -978,8 +973,9 @@  vect_get_store_cost (struct data_reference *dr, in
     {
     case dr_aligned:
       {
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
-					  vector_store, stmt_info, 0);
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
+					  vector_store, stmt_info, 0,
+					  vect_body);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_store_cost: aligned.");
@@ -990,9 +986,9 @@  vect_get_store_cost (struct data_reference *dr, in
     case dr_unaligned_supported:
       {
         /* Here, we assign an additional cost for the unaligned store.  */
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
 					  unaligned_store, stmt_info,
-					  DR_MISALIGNMENT (dr));
+					  DR_MISALIGNMENT (dr), vect_body);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_store_cost: unaligned supported by "
@@ -1025,13 +1021,15 @@  vect_get_store_cost (struct data_reference *dr, in
    access scheme chosen.  */
 
 void
-vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, bool load_lanes_p,
-		      slp_tree slp_node, stmt_vector_for_cost *stmt_cost_vec)
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies,
+		      bool load_lanes_p, slp_tree slp_node,
+		      stmt_vector_for_cost *prologue_cost_vec,
+		      stmt_vector_for_cost *body_cost_vec)
 {
   int group_size;
   gimple first_stmt;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
-  unsigned int inside_cost = 0, outside_cost = 0;
+  unsigned int inside_cost = 0, prologue_cost = 0;
 
   /* The SLP costs were already calculated during SLP tree build.  */
   if (PURE_SLP_STMT (stmt_info))
@@ -1059,8 +1057,8 @@  void
     {
       /* Uses an even and odd extract operations for each needed permute.  */
       int nstmts = ncopies * exact_log2 (group_size) * group_size;
-      inside_cost += record_stmt_cost (stmt_cost_vec, nstmts, vec_perm,
-				       stmt_info, 0);
+      inside_cost += record_stmt_cost (body_cost_vec, nstmts, vec_perm,
+				       stmt_info, 0, vect_body);
 
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
@@ -1072,24 +1070,22 @@  void
     {
       /* N scalar loads plus gathering them into a vector.  */
       tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-      inside_cost += record_stmt_cost (stmt_cost_vec,
+      inside_cost += record_stmt_cost (body_cost_vec,
 				       ncopies * TYPE_VECTOR_SUBPARTS (vectype),
-				       scalar_load, stmt_info, 0);
-      inside_cost += record_stmt_cost (stmt_cost_vec, ncopies, vec_construct,
-				       stmt_info, 0);
+				       scalar_load, stmt_info, 0, vect_body);
+      inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
+				       stmt_info, 0, vect_body);
     }
   else
     vect_get_load_cost (first_dr, ncopies,
 			((!STMT_VINFO_GROUPED_ACCESS (stmt_info))
 			 || group_size > 1 || slp_node),
-			&inside_cost, &outside_cost, stmt_cost_vec);
+			&inside_cost, &prologue_cost, 
+			prologue_cost_vec, body_cost_vec, true);
 
   if (vect_print_dump_info (REPORT_COST))
     fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
-             "outside_cost = %d .", inside_cost, outside_cost);
-
-  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
-  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
+             "prologue_cost = %d .", inside_cost, prologue_cost);
 }
 
 
@@ -1097,8 +1093,10 @@  void
 void
 vect_get_load_cost (struct data_reference *dr, int ncopies,
 		    bool add_realign_cost, unsigned int *inside_cost,
-		    unsigned int *outside_cost,
-		    stmt_vector_for_cost *stmt_cost_vec)
+		    unsigned int *prologue_cost,
+		    stmt_vector_for_cost *prologue_cost_vec,
+		    stmt_vector_for_cost *body_cost_vec,
+		    bool record_prologue_costs)
 {
   int alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
   gimple stmt = DR_STMT (dr);
@@ -1108,8 +1106,8 @@  vect_get_load_cost (struct data_reference *dr, int
     {
     case dr_aligned:
       {
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
-					  vector_load, stmt_info, 0);
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
+					  stmt_info, 0, vect_body);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_load_cost: aligned.");
@@ -1119,9 +1117,9 @@  vect_get_load_cost (struct data_reference *dr, int
     case dr_unaligned_supported:
       {
         /* Here, we assign an additional cost for the unaligned load.  */
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
 					  unaligned_load, stmt_info,
-					  DR_MISALIGNMENT (dr));
+					  DR_MISALIGNMENT (dr), vect_body);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_load_cost: unaligned supported by "
@@ -1131,17 +1129,17 @@  vect_get_load_cost (struct data_reference *dr, int
       }
     case dr_explicit_realign:
       {
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies * 2,
-					  vector_load, stmt_info, 0);
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
-					  vec_perm, stmt_info, 0);
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies * 2,
+					  vector_load, stmt_info, 0, vect_body);
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
+					  vec_perm, stmt_info, 0, vect_body);
 
         /* FIXME: If the misalignment remains fixed across the iterations of
            the containing loop, the following cost should be added to the
-           outside costs.  */
+           prologue costs.  */
         if (targetm.vectorize.builtin_mask_for_load)
-	  *inside_cost += record_stmt_cost (stmt_cost_vec, 1, vector_stmt,
-					    stmt_info, 0);
+	  *inside_cost += record_stmt_cost (body_cost_vec, 1, vector_stmt,
+					    stmt_info, 0, vect_body);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump, "vect_model_load_cost: explicit realign");
@@ -1161,17 +1159,21 @@  vect_get_load_cost (struct data_reference *dr, int
            access in the group.  Inside the loop, there is a load op
            and a realignment op.  */
 
-        if (add_realign_cost)
+        if (add_realign_cost && record_prologue_costs)
           {
-            *outside_cost = 2 * vect_get_stmt_cost (vector_stmt);
+	    *prologue_cost += record_stmt_cost (prologue_cost_vec, 2,
+						vector_stmt, stmt_info,
+						0, vect_prologue);
             if (targetm.vectorize.builtin_mask_for_load)
-              *outside_cost += vect_get_stmt_cost (vector_stmt);
+	      *prologue_cost += record_stmt_cost (prologue_cost_vec, 1,
+						  vector_stmt, stmt_info,
+						  0, vect_prologue);
           }
 
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
-					  vector_load, stmt_info, 0);
-	*inside_cost += record_stmt_cost (stmt_cost_vec, ncopies,
-					  vec_perm, stmt_info, 0);
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
+					  stmt_info, 0, vect_body);
+	*inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_perm,
+					  stmt_info, 0, vect_body);
 
         if (vect_print_dump_info (REPORT_COST))
           fprintf (vect_dump,
@@ -3879,7 +3881,8 @@  vectorizable_store (gimple stmt, gimple_stmt_itera
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
-      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL, NULL);
+      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt,
+			     NULL, NULL, NULL);
       return true;
     }
 
@@ -4435,7 +4438,7 @@  vectorizable_load (gimple stmt, gimple_stmt_iterat
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
-      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL, NULL);
+      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL, NULL, NULL);
       return true;
     }
 
@@ -5875,7 +5878,6 @@  new_stmt_vec_info (gimple stmt, loop_vec_info loop
     STMT_VINFO_DEF_TYPE (res) = vect_internal_def;
 
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
-  STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
   STMT_SLP_TYPE (res) = loop_vect;
   GROUP_FIRST_ELEMENT (res) = NULL;
   GROUP_NEXT_ELEMENT (res) = NULL;
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	(revision 189574)
+++ gcc/config/spu/spu.c	(working copy)
@@ -6623,8 +6623,8 @@  spu_builtin_vectorization_cost (enum vect_cost_for
 static void *
 spu_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
 {
-  unsigned *cost = XNEW (unsigned);
-  *cost = 0;
+  unsigned *cost = XNEWVEC (unsigned, 3);
+  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
   return cost;
 }
 
@@ -6632,24 +6632,25 @@  spu_init_cost (struct loop *loop_info ATTRIBUTE_UN
 
 static unsigned
 spu_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
-		   struct _stmt_vec_info *stmt_info, int misalign)
+		   struct _stmt_vec_info *stmt_info, int misalign,
+		   enum vect_cost_model_location where)
 {
   unsigned *cost = (unsigned *) data;
   unsigned retval = 0;
 
   if (flag_vect_cost_model)
     {
-      tree vectype = stmt_vectype (stmt_info);
+      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
       int stmt_cost = spu_builtin_vectorization_cost (kind, vectype, misalign);
 
       /* Statements in an inner loop relative to the loop being
 	 vectorized are weighted more heavily.  The value here is
 	 arbitrary and could potentially be improved with analysis.  */
-      if (stmt_in_inner_loop_p (stmt_info))
+      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
 	count *= 50;  /* FIXME.  */
 
       retval = (unsigned) (count * stmt_cost);
-      *cost += retval;
+      cost[where] += retval;
     }
 
   return retval;
@@ -6657,10 +6658,14 @@  spu_add_stmt_cost (void *data, int count, enum vec
 
 /* Implement targetm.vectorize.finish_cost.  */
 
-static unsigned
-spu_finish_cost (void *data)
+static void
+spu_finish_cost (void *data, unsigned *prologue_cost,
+		 unsigned *body_cost, unsigned *epilogue_cost)
 {
-  return *((unsigned *) data);
+  unsigned *cost = (unsigned *) data;
+  *prologue_cost = cost[vect_prologue];
+  *body_cost     = cost[vect_body];
+  *epilogue_cost = cost[vect_epilogue];
 }
 
 /* Implement targetm.vectorize.destroy_cost_data.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 189574)
+++ gcc/config/i386/i386.c	(working copy)
@@ -40070,8 +40070,8 @@  ix86_autovectorize_vector_sizes (void)
 static void *
 ix86_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
 {
-  unsigned *cost = XNEW (unsigned);
-  *cost = 0;
+  unsigned *cost = XNEWVEC (unsigned, 3);
+  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
   return cost;
 }
 
@@ -40079,24 +40079,25 @@  ix86_init_cost (struct loop *loop_info ATTRIBUTE_U
 
 static unsigned
 ix86_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
-		    struct _stmt_vec_info *stmt_info, int misalign)
+		    struct _stmt_vec_info *stmt_info, int misalign,
+		    enum vect_cost_model_location where)
 {
   unsigned *cost = (unsigned *) data;
   unsigned retval = 0;
 
   if (flag_vect_cost_model)
     {
-      tree vectype = stmt_vectype (stmt_info);
+      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
       int stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
 
       /* Statements in an inner loop relative to the loop being
 	 vectorized are weighted more heavily.  The value here is
 	 arbitrary and could potentially be improved with analysis.  */
-      if (stmt_in_inner_loop_p (stmt_info))
+      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
 	count *= 50;  /* FIXME.  */
 
       retval = (unsigned) (count * stmt_cost);
-      *cost += retval;
+      cost[where] += retval;
     }
 
   return retval;
@@ -40104,10 +40105,14 @@  ix86_add_stmt_cost (void *data, int count, enum ve
 
 /* Implement targetm.vectorize.finish_cost.  */
 
-static unsigned
-ix86_finish_cost (void *data)
+static void
+ix86_finish_cost (void *data, unsigned *prologue_cost,
+		  unsigned *body_cost, unsigned *epilogue_cost)
 {
-  return *((unsigned *) data);
+  unsigned *cost = (unsigned *) data;
+  *prologue_cost = cost[vect_prologue];
+  *body_cost     = cost[vect_body];
+  *epilogue_cost = cost[vect_epilogue];
 }
 
 /* Implement targetm.vectorize.destroy_cost_data.  */
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 189574)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3525,8 +3525,8 @@  rs6000_preferred_simd_mode (enum machine_mode mode
 static void *
 rs6000_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
 {
-  unsigned *cost = XNEW (unsigned);
-  *cost = 0;
+  unsigned *cost = XNEWVEC (unsigned, 3);
+  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
   return cost;
 }
 
@@ -3534,24 +3534,25 @@  rs6000_init_cost (struct loop *loop_info ATTRIBUTE
 
 static unsigned
 rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
-		      struct _stmt_vec_info *stmt_info, int misalign)
+		      struct _stmt_vec_info *stmt_info, int misalign,
+		      enum vect_cost_model_location where)
 {
   unsigned *cost = (unsigned *) data;
   unsigned retval = 0;
 
   if (flag_vect_cost_model)
     {
-      tree vectype = stmt_vectype (stmt_info);
+      tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
       int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype,
 							 misalign);
       /* Statements in an inner loop relative to the loop being
 	 vectorized are weighted more heavily.  The value here is
 	 arbitrary and could potentially be improved with analysis.  */
-      if (stmt_in_inner_loop_p (stmt_info))
+      if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
 	count *= 50;  /* FIXME.  */
 
       retval = (unsigned) (count * stmt_cost);
-      *cost += retval;
+      cost[where] += retval;
     }
 
   return retval;
@@ -3559,10 +3560,14 @@  rs6000_add_stmt_cost (void *data, int count, enum
 
 /* Implement targetm.vectorize.finish_cost.  */
 
-static unsigned
-rs6000_finish_cost (void *data)
+static void
+rs6000_finish_cost (void *data, unsigned *prologue_cost,
+		    unsigned *body_cost, unsigned *epilogue_cost)
 {
-  return *((unsigned *) data);
+  unsigned *cost = (unsigned *) data;
+  *prologue_cost = cost[vect_prologue];
+  *body_cost     = cost[vect_body];
+  *epilogue_cost = cost[vect_epilogue];
 }
 
 /* Implement targetm.vectorize.destroy_cost_data.  */
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	(revision 189574)
+++ gcc/tree-vect-slp.c	(working copy)
@@ -93,7 +93,7 @@  vect_free_slp_instance (slp_instance instance)
   vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
   VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (instance));
   VEC_free (slp_tree, heap, SLP_INSTANCE_LOADS (instance));
-  VEC_free (stmt_info_for_cost, heap, SLP_INSTANCE_STMT_COST_VEC (instance));
+  VEC_free (stmt_info_for_cost, heap, SLP_INSTANCE_BODY_COST_VEC (instance));
 }
 
 
@@ -121,7 +121,6 @@  vect_create_new_slp_node (VEC (gimple, heap) *scal
   SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
   SLP_TREE_VEC_STMTS (node) = NULL;
   SLP_TREE_CHILDREN (node) = VEC_alloc (slp_void_p, heap, nops);
-  SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0;
 
   return node;
 }
@@ -179,7 +178,8 @@  vect_get_and_check_slp_defs (loop_vec_info loop_vi
                              slp_tree slp_node, gimple stmt,
 			     int ncopies_for_cost, bool first,
                              VEC (slp_oprnd_info, heap) **oprnds_info,
-			     stmt_vector_for_cost *stmt_cost_vec)
+			     stmt_vector_for_cost *prologue_cost_vec,
+			     stmt_vector_for_cost *body_cost_vec)
 {
   tree oprnd;
   unsigned int i, number_of_oprnds;
@@ -320,7 +320,8 @@  vect_get_and_check_slp_defs (loop_vec_info loop_vi
 	      if (REFERENCE_CLASS_P (lhs))
 		/* Store.  */
                 vect_model_store_cost (stmt_info, ncopies_for_cost, false,
-				       dt, slp_node, stmt_cost_vec);
+				       dt, slp_node, prologue_cost_vec,
+				       body_cost_vec);
 	      else
 		{
 		  enum vect_def_type dts[2];
@@ -329,7 +330,7 @@  vect_get_and_check_slp_defs (loop_vec_info loop_vi
 		  /* Not memory operation (we don't call this function for
 		     loads).  */
 		  vect_model_simple_cost (stmt_info, ncopies_for_cost, dts,
-					  slp_node, stmt_cost_vec);
+					  prologue_cost_vec, body_cost_vec);
 		}
 	    }
 	}
@@ -451,7 +452,8 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
                      VEC (int, heap) **load_permutation,
                      VEC (slp_tree, heap) **loads,
                      unsigned int vectorization_factor, bool *loads_permuted,
-		     stmt_vector_for_cost *stmt_cost_vec)
+		     stmt_vector_for_cost *prologue_cost_vec,
+		     stmt_vector_for_cost *body_cost_vec)
 {
   unsigned int i;
   VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (*node);
@@ -712,7 +714,8 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 	      if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node,
 						stmt, ncopies_for_cost,
 						(i == 0), &oprnds_info,
-						stmt_cost_vec))
+						prologue_cost_vec,
+						body_cost_vec))
 		{
 	  	  vect_free_oprnd_info (&oprnds_info);
  		  return false;
@@ -802,7 +805,7 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
                   /* Analyze costs (for the first stmt in the group).  */
                   vect_model_load_cost (vinfo_for_stmt (stmt),
                                         ncopies_for_cost, false, *node,
-					stmt_cost_vec);
+					prologue_cost_vec, body_cost_vec);
                 }
 
               /* Store the place of this load in the interleaving chain.  In
@@ -876,7 +879,8 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 	  /* Find the def-stmts.  */
 	  if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node, stmt,
 					    ncopies_for_cost, (i == 0),
-					    &oprnds_info, stmt_cost_vec))
+					    &oprnds_info, prologue_cost_vec,
+					    body_cost_vec))
 	    {
 	      vect_free_oprnd_info (&oprnds_info);
 	      return false;
@@ -884,9 +888,6 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 	}
     }
 
-  /* Add the costs of the node to the overall instance costs.  */
-  *outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node);
-
   /* Grouped loads were reached - stop the recursion.  */
   if (stop_recursion)
     {
@@ -895,8 +896,8 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
         {
 	  gimple first_stmt = VEC_index (gimple, stmts, 0);
           *loads_permuted = true;
-	  (void) record_stmt_cost (stmt_cost_vec, group_size, vec_perm, 
-				   vinfo_for_stmt (first_stmt), 0);
+	  (void) record_stmt_cost (body_cost_vec, group_size, vec_perm, 
+				   vinfo_for_stmt (first_stmt), 0, vect_body);
         }
       else
         {
@@ -925,7 +926,7 @@  vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 				   outside_cost, ncopies_for_cost,
 				   max_nunits, load_permutation, loads,
 				   vectorization_factor, loads_permuted,
-				   stmt_cost_vec))
+				   prologue_cost_vec, body_cost_vec))
         {
 	  if (child)
 	    oprnd_info->def_stmts = NULL;
@@ -1470,7 +1471,8 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
   struct data_reference *dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
   bool loads_permuted = false;
   VEC (gimple, heap) *scalar_stmts;
-  stmt_vector_for_cost stmt_cost_vec;
+  stmt_vector_for_cost body_cost_vec, prologue_cost_vec;
+  stmt_info_for_cost *si;
 
   if (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)))
     {
@@ -1556,15 +1558,19 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
 
   load_permutation = VEC_alloc (int, heap, group_size * group_size);
   loads = VEC_alloc (slp_tree, heap, group_size);
-  stmt_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
+  prologue_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
+  body_cost_vec = VEC_alloc (stmt_info_for_cost, heap, 10);
 
   /* Build the tree for the SLP instance.  */
   if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &node, group_size,
                            &outside_cost, ncopies_for_cost,
 			   &max_nunits, &load_permutation, &loads,
 			   vectorization_factor, &loads_permuted,
-			   &stmt_cost_vec))
+			   &prologue_cost_vec, &body_cost_vec))
     {
+      void *data = (loop_vinfo ? LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)
+		    : BB_VINFO_TARGET_COST_DATA (bb_vinfo));
+
       /* Calculate the unrolling factor based on the smallest type.  */
       if (max_nunits > nunits)
         unrolling_factor = least_common_multiple (max_nunits, group_size)
@@ -1575,7 +1581,8 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
           if (vect_print_dump_info (REPORT_SLP))
             fprintf (vect_dump, "Build SLP failed: unrolling required in basic"
                                " block SLP");
-	  VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
+	  VEC_free (stmt_info_for_cost, heap, body_cost_vec);
+	  VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
           return false;
         }
 
@@ -1584,8 +1591,7 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
       SLP_INSTANCE_TREE (new_instance) = node;
       SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
       SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
-      SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost;
-      SLP_INSTANCE_STMT_COST_VEC (new_instance) = stmt_cost_vec;
+      SLP_INSTANCE_BODY_COST_VEC (new_instance) = body_cost_vec;
       SLP_INSTANCE_LOADS (new_instance) = loads;
       SLP_INSTANCE_FIRST_LOAD_STMT (new_instance) = NULL;
       SLP_INSTANCE_LOAD_PERMUTATION (new_instance) = load_permutation;
@@ -1603,6 +1609,7 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
                 }
 
               vect_free_slp_instance (new_instance);
+	      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
               return false;
             }
 
@@ -1612,6 +1619,19 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
       else
         VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (new_instance));
 
+      /* Record the prologue costs, which were delayed until we were
+	 sure that SLP was successful.  Unlike the body costs, we know
+	 the final values now regardless of the loop vectorization factor.  */
+      FOR_EACH_VEC_ELT (stmt_info_for_cost, prologue_cost_vec, i, si)
+	{
+	  struct _stmt_vec_info *stmt_info
+	    = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
+	  (void) add_stmt_cost (data, si->count, si->kind, stmt_info,
+				si->misalign, vect_prologue);
+	}
+
+      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
+
       if (loop_vinfo)
         VEC_safe_push (slp_instance, heap,
                        LOOP_VINFO_SLP_INSTANCES (loop_vinfo),
@@ -1626,7 +1646,10 @@  vect_analyze_slp_instance (loop_vec_info loop_vinf
       return true;
     }
   else
-    VEC_free (stmt_info_for_cost, heap, stmt_cost_vec);
+    {
+      VEC_free (stmt_info_for_cost, heap, body_cost_vec);
+      VEC_free (stmt_info_for_cost, heap, prologue_cost_vec);
+    }
 
   /* Failed to SLP.  */
   /* Free the allocated memory.  */
@@ -1932,26 +1955,27 @@  vect_bb_vectorization_profitable_p (bb_vec_info bb
   slp_instance instance;
   int i, j;
   unsigned int vec_inside_cost = 0, vec_outside_cost = 0, scalar_cost = 0;
+  unsigned int vec_prologue_cost = 0, vec_epilogue_cost = 0;
   unsigned int stmt_cost;
   gimple stmt;
   gimple_stmt_iterator si;
   basic_block bb = BB_VINFO_BB (bb_vinfo);
+  void *target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
   stmt_vec_info stmt_info = NULL;
-  tree dummy_type = NULL;
-  int dummy = 0;
-  stmt_vector_for_cost stmt_cost_vec;
+  stmt_vector_for_cost body_cost_vec;
   stmt_info_for_cost *ci;
 
   /* Calculate vector costs.  */
   FOR_EACH_VEC_ELT (slp_instance, slp_instances, i, instance)
     {
-      vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
-      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
+      body_cost_vec = SLP_INSTANCE_BODY_COST_VEC (instance);
 
-      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, ci)
-	(void) add_stmt_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo),
-			      ci->count, ci->kind,
-			      vinfo_for_stmt (ci->stmt), ci->misalign);
+      FOR_EACH_VEC_ELT (stmt_info_for_cost, body_cost_vec, j, ci)
+	{
+	  stmt_info = ci->stmt ? vinfo_for_stmt (ci->stmt) : NULL;
+	  (void) add_stmt_cost (target_cost_data, ci->count, ci->kind,
+				stmt_info, ci->misalign, vect_body);
+	}
     }
 
   /* Calculate scalar cost.  */
@@ -1967,29 +1991,29 @@  vect_bb_vectorization_profitable_p (bb_vec_info bb
       if (STMT_VINFO_DATA_REF (stmt_info))
         {
           if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
-            stmt_cost = targetm.vectorize.builtin_vectorization_cost 
-                          (scalar_load, dummy_type, dummy);
+            stmt_cost = vect_get_stmt_cost (scalar_load);
           else
-            stmt_cost = targetm.vectorize.builtin_vectorization_cost
-                          (scalar_store, dummy_type, dummy);
+            stmt_cost = vect_get_stmt_cost (scalar_store);
         }
       else
-        stmt_cost = targetm.vectorize.builtin_vectorization_cost
-                      (scalar_stmt, dummy_type, dummy);
+        stmt_cost = vect_get_stmt_cost (scalar_stmt);
 
       scalar_cost += stmt_cost;
     }
 
   /* Complete the target-specific cost calculation.  */
-  vec_inside_cost = finish_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo));
+  finish_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo), &vec_prologue_cost,
+	       &vec_inside_cost, &vec_epilogue_cost);
 
+  vec_outside_cost = vec_prologue_cost + vec_epilogue_cost;
+
   if (vect_print_dump_info (REPORT_COST))
     {
       fprintf (vect_dump, "Cost model analysis: \n");
       fprintf (vect_dump, "  Vector inside of basic block cost: %d\n",
                vec_inside_cost);
-      fprintf (vect_dump, "  Vector outside of basic block cost: %d\n",
-               vec_outside_cost);
+      fprintf (vect_dump, "  Vector prologue cost: %d\n", vec_prologue_cost);
+      fprintf (vect_dump, "  Vector epilogue cost: %d\n", vec_epilogue_cost);
       fprintf (vect_dump, "  Scalar cost of basic block: %d", scalar_cost);
     }
 
@@ -2200,8 +2224,9 @@  vect_update_slp_costs_according_to_vf (loop_vec_in
   unsigned int i, j, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
   slp_instance instance;
-  stmt_vector_for_cost stmt_cost_vec;
+  stmt_vector_for_cost body_cost_vec;
   stmt_info_for_cost *si;
+  void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
   if (vect_print_dump_info (REPORT_SLP))
     fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ===");
@@ -2214,12 +2239,12 @@  vect_update_slp_costs_according_to_vf (loop_vec_in
       /* Record the instance's instructions in the target cost model.
 	 This was delayed until here because the count of instructions
 	 isn't known beforehand.  */
-      stmt_cost_vec = SLP_INSTANCE_STMT_COST_VEC (instance);
+      body_cost_vec = SLP_INSTANCE_BODY_COST_VEC (instance);
 
-      FOR_EACH_VEC_ELT (stmt_info_for_cost, stmt_cost_vec, j, si)
-	(void) add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
-			      si->count * ncopies, si->kind,
-			      vinfo_for_stmt (si->stmt), si->misalign);
+      FOR_EACH_VEC_ELT (stmt_info_for_cost, body_cost_vec, j, si)
+	(void) add_stmt_cost (data, si->count * ncopies, si->kind,
+			      vinfo_for_stmt (si->stmt), si->misalign,
+			      vect_body);
     }
 }