diff mbox

More of ipa-inline housekeeping

Message ID 20110413222034.GA12767@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka April 13, 2011, 10:20 p.m. UTC
Hi,
this patch moves inline_summary from field in cgraph_node into its own on side
datastructure. This moves it from arcane decision of mine to split all IPA data
into global/local datas stored in common datastructure into the scheme we
developed for new IPA passes some time ago.

The advantage is that the code is more contained and less spread across the
compiler. We also make cgraph_node smaller and dumps more compact that never
hurts.

While working on it I noticed that Richi's patch to introduce cgraph_edge
times/sizes is bit iffy in computing data when they are missing in the
datastructure. Also it computes incomming edge costs instead of outgoing that
leads to fact that not all edges gets their info computed for IPA inliner
(think of newly discovered direct calls or IPA merging).

I fixed this on the and added sanity check that the fields are initialized.
This has shown problem with early inliner iteration fixed thusly and fact that
early inliner is attempting to compute overall growth at a time the inline
parameters are not computed for functions not visited by early optimizations
yet. We previously agreed that early inliner should not try to do that (as this
leads to early inliner inlining functions called once that should be deferred
for later consieration).  I just hope it won't cause benchmarks to
regress too much ;)

Having place to pile inline analysis info in, there is more to cleanup. The
cgraph_local/cgraph_global fields probably should go and the stuff from global
info should go into inline_summary datastructure, too (the lifetimes are
essentially the same so there is no need for the split).  I will handle this
incrementally.

Bootstrapped/regtested x86_64-linux with slightly modified version of the patch.
Re-testing with final version and intend to commit the patch tomorrow.

Honza

	* cgraph.c (dump_cgraph_node): Do not dump inline summaries.
	* cgraph.h (struct inline_summary): Move to ipa-inline.h
	(cgraph_local_info): Remove inline_summary.
	* ipa-cp.c: Include ipa-inline.h.
	(ipcp_cloning_candidate_p, ipcp_estimate_growth,
	ipcp_estimate_cloning_cost, ipcp_insert_stage): Use inline_summary
	accesor.
	* lto-cgraph.c (lto_output_node): Do not stream inline summary.
	(input_overwrite_node): Do not set inline summary.
	(input_node): Do not stream inline summary.
	* ipa-inline.c (cgraph_decide_inlining): Dump inline summaries.
	(cgraph_decide_inlining_incrementally): Do not try to estimate overall
	growth; we do not have inline parameters computed for that anyway.
	(cgraph_early_inlining): After inlining compute call_stmt_sizes.
	* ipa-inline.h (struct inline_summary): Move here from ipa-inline.h
	(inline_summary_t): New type and VECtor.
	(debug_inline_summary, dump_inline_summaries): Declare.
	(inline_summary): Use VOCtor.
	(estimate_edge_growth): Kill hack computing call stmt size directly.
	* lto-section-in.c (lto_section_name): Add inline section.
	* ipa-inline-analysis.c: Include lto-streamer.h
	(node_removal_hook_holder, node_duplication_hook_holder): New holders
	(inline_node_removal_hook, inline_node_duplication_hook): New functions.
	(inline_summary_vec): Define.
	(inline_summary_alloc, dump_inline_summary, debug_inline_summary,
	dump_inline_summaries): New functions.
	(estimate_function_body_sizes): Properly compute size/time of outgoing calls.
	(compute_inline_parameters): Alloc inline_summary; do not compute size/time
	of incomming calls.
	(estimate_edge_time): Avoid missing time summary hack.
	(inline_read_summary): Read inline summary info.
	(inline_write_summary): Write inline summary info.
	(inline_free_summary): Free all hooks and inline summary vector.
	* lto-streamer.h: Add LTO_section_inline_summary section.
	* Makefile.in (ipa-cp.o, ipa-inline-analysis.o): Update dependencies.
	* ipa.c (cgraph_remove_unreachable_nodes): Fix dump file formating.

	* lto.c: Include ipa-inline.h
	(add_cgraph_node_to_partition, undo_partition): Use inline_summary accessor.
	(ipa_node_duplication_hook): Fix declaration.
	* Make-lang.in (lto.o): Update dependencies.

Comments

Richard Biener April 14, 2011, 8:58 a.m. UTC | #1
On Thu, Apr 14, 2011 at 12:20 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch moves inline_summary from field in cgraph_node into its own on side
> datastructure. This moves it from arcane decision of mine to split all IPA data
> into global/local datas stored in common datastructure into the scheme we
> developed for new IPA passes some time ago.
>
> The advantage is that the code is more contained and less spread across the
> compiler. We also make cgraph_node smaller and dumps more compact that never
> hurts.
>
> While working on it I noticed that Richi's patch to introduce cgraph_edge
> times/sizes is bit iffy in computing data when they are missing in the
> datastructure. Also it computes incomming edge costs instead of outgoing that
> leads to fact that not all edges gets their info computed for IPA inliner
> (think of newly discovered direct calls or IPA merging).

Ah, that was the reason ... I didn't dig deep enough ... ;)

>
> I fixed this on the and added sanity check that the fields are initialized.
> This has shown problem with early inliner iteration fixed thusly and fact that
> early inliner is attempting to compute overall growth at a time the inline
> parameters are not computed for functions not visited by early optimizations
> yet. We previously agreed that early inliner should not try to do that (as this
> leads to early inliner inlining functions called once that should be deferred
> for later consieration).  I just hope it won't cause benchmarks to
> regress too much ;)

Yeah, we agreed to that.  And I forgot about it as it wasn't part of the
early inliner reorg (which was supposed to be a 1:1 transform).

>
> Having place to pile inline analysis info in, there is more to cleanup. The
> cgraph_local/cgraph_global fields probably should go and the stuff from global
> info should go into inline_summary datastructure, too (the lifetimes are
> essentially the same so there is no need for the split).  I will handle this
> incrementally.
>
> Bootstrapped/regtested x86_64-linux with slightly modified version of the patch.
> Re-testing with final version and intend to commit the patch tomorrow.

I looked over the patch and it looks ok to me.

Thanks,
Richard.

> Honza
>
>        * cgraph.c (dump_cgraph_node): Do not dump inline summaries.
>        * cgraph.h (struct inline_summary): Move to ipa-inline.h
>        (cgraph_local_info): Remove inline_summary.
>        * ipa-cp.c: Include ipa-inline.h.
>        (ipcp_cloning_candidate_p, ipcp_estimate_growth,
>        ipcp_estimate_cloning_cost, ipcp_insert_stage): Use inline_summary
>        accesor.
>        * lto-cgraph.c (lto_output_node): Do not stream inline summary.
>        (input_overwrite_node): Do not set inline summary.
>        (input_node): Do not stream inline summary.
>        * ipa-inline.c (cgraph_decide_inlining): Dump inline summaries.
>        (cgraph_decide_inlining_incrementally): Do not try to estimate overall
>        growth; we do not have inline parameters computed for that anyway.
>        (cgraph_early_inlining): After inlining compute call_stmt_sizes.
>        * ipa-inline.h (struct inline_summary): Move here from ipa-inline.h
>        (inline_summary_t): New type and VECtor.
>        (debug_inline_summary, dump_inline_summaries): Declare.
>        (inline_summary): Use VOCtor.
>        (estimate_edge_growth): Kill hack computing call stmt size directly.
>        * lto-section-in.c (lto_section_name): Add inline section.
>        * ipa-inline-analysis.c: Include lto-streamer.h
>        (node_removal_hook_holder, node_duplication_hook_holder): New holders
>        (inline_node_removal_hook, inline_node_duplication_hook): New functions.
>        (inline_summary_vec): Define.
>        (inline_summary_alloc, dump_inline_summary, debug_inline_summary,
>        dump_inline_summaries): New functions.
>        (estimate_function_body_sizes): Properly compute size/time of outgoing calls.
>        (compute_inline_parameters): Alloc inline_summary; do not compute size/time
>        of incomming calls.
>        (estimate_edge_time): Avoid missing time summary hack.
>        (inline_read_summary): Read inline summary info.
>        (inline_write_summary): Write inline summary info.
>        (inline_free_summary): Free all hooks and inline summary vector.
>        * lto-streamer.h: Add LTO_section_inline_summary section.
>        * Makefile.in (ipa-cp.o, ipa-inline-analysis.o): Update dependencies.
>        * ipa.c (cgraph_remove_unreachable_nodes): Fix dump file formating.
>
>        * lto.c: Include ipa-inline.h
>        (add_cgraph_node_to_partition, undo_partition): Use inline_summary accessor.
>        (ipa_node_duplication_hook): Fix declaration.
>        * Make-lang.in (lto.o): Update dependencies.
> Index: cgraph.c
> ===================================================================
> --- cgraph.c    (revision 172396)
> +++ cgraph.c    (working copy)
> @@ -1876,22 +1876,6 @@ dump_cgraph_node (FILE *f, struct cgraph
>   if (node->count)
>     fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
>             (HOST_WIDEST_INT)node->count);
> -  if (node->local.inline_summary.self_time)
> -    fprintf (f, " %i time, %i benefit", node->local.inline_summary.self_time,
> -                                       node->local.inline_summary.time_inlining_benefit);
> -  if (node->global.time && node->global.time
> -      != node->local.inline_summary.self_time)
> -    fprintf (f, " (%i after inlining)", node->global.time);
> -  if (node->local.inline_summary.self_size)
> -    fprintf (f, " %i size, %i benefit", node->local.inline_summary.self_size,
> -                                       node->local.inline_summary.size_inlining_benefit);
> -  if (node->global.size && node->global.size
> -      != node->local.inline_summary.self_size)
> -    fprintf (f, " (%i after inlining)", node->global.size);
> -  if (node->local.inline_summary.estimated_self_stack_size)
> -    fprintf (f, " %i bytes stack usage", (int)node->local.inline_summary.estimated_self_stack_size);
> -  if (node->global.estimated_stack_size != node->local.inline_summary.estimated_self_stack_size)
> -    fprintf (f, " %i bytes after inlining", (int)node->global.estimated_stack_size);
>   if (node->origin)
>     fprintf (f, " nested in: %s", cgraph_node_name (node->origin));
>   if (node->needed)
> Index: cgraph.h
> ===================================================================
> --- cgraph.h    (revision 172396)
> +++ cgraph.h    (working copy)
> @@ -58,23 +58,6 @@ struct lto_file_decl_data;
>  extern const char * const cgraph_availability_names[];
>  extern const char * const ld_plugin_symbol_resolution_names[];
>
> -/* Function inlining information.  */
> -
> -struct GTY(()) inline_summary
> -{
> -  /* Estimated stack frame consumption by the function.  */
> -  HOST_WIDE_INT estimated_self_stack_size;
> -
> -  /* Size of the function body.  */
> -  int self_size;
> -  /* How many instructions are likely going to disappear after inlining.  */
> -  int size_inlining_benefit;
> -  /* Estimated time spent executing the function body.  */
> -  int self_time;
> -  /* How much time is going to be saved by inlining.  */
> -  int time_inlining_benefit;
> -};
> -
>  /* Information about thunk, used only for same body aliases.  */
>
>  struct GTY(()) cgraph_thunk_info {
> @@ -95,8 +78,6 @@ struct GTY(()) cgraph_local_info {
>   /* File stream where this node is being written to.  */
>   struct lto_file_decl_data * lto_file_data;
>
> -  struct inline_summary inline_summary;
> -
>   /* Set when function function is visible in current compilation unit only
>      and its address is never taken.  */
>   unsigned local : 1;
> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c    (revision 172396)
> +++ ipa-cp.c    (working copy)
> @@ -148,6 +148,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-inline.h"
>  #include "fibheap.h"
>  #include "params.h"
> +#include "ipa-inline.h"
>
>  /* Number of functions identified as candidates for cloning. When not cloning
>    we can simplify iterate stage not forcing it to go through the decision
> @@ -495,7 +496,7 @@ ipcp_cloning_candidate_p (struct cgraph_
>                 cgraph_node_name (node));
>       return false;
>     }
> -  if (node->local.inline_summary.self_size < n_calls)
> +  if (inline_summary (node)->self_size < n_calls)
>     {
>       if (dump_file)
>         fprintf (dump_file, "Considering %s for cloning; code would shrink.\n",
> @@ -1189,7 +1190,7 @@ ipcp_estimate_growth (struct cgraph_node
>      call site.  Precise cost is difficult to get, as our size metric counts
>      constants and moves as free.  Generally we are looking for cases that
>      small function is called very many times.  */
> -  growth = node->local.inline_summary.self_size
> +  growth = inline_summary (node)->self_size
>           - removable_args * redirectable_node_callers;
>   if (growth < 0)
>     return 0;
> @@ -1229,7 +1230,7 @@ ipcp_estimate_cloning_cost (struct cgrap
>     cost /= freq_sum * 1000 / REG_BR_PROB_BASE + 1;
>   if (dump_file)
>     fprintf (dump_file, "Cost of versioning %s is %i, (size: %i, freq: %i)\n",
> -             cgraph_node_name (node), cost, node->local.inline_summary.self_size,
> +             cgraph_node_name (node), cost, inline_summary (node)->self_size,
>             freq_sum);
>   return cost + 1;
>  }
> @@ -1364,7 +1365,7 @@ ipcp_insert_stage (void)
>       {
>        if (node->count > max_count)
>          max_count = node->count;
> -       overall_size += node->local.inline_summary.self_size;
> +       overall_size += inline_summary (node)->self_size;
>       }
>
>   max_new_size = overall_size;
> Index: lto-cgraph.c
> ===================================================================
> --- lto-cgraph.c        (revision 172396)
> +++ lto-cgraph.c        (working copy)
> @@ -465,16 +465,6 @@ lto_output_node (struct lto_simple_outpu
>
>   if (tag == LTO_cgraph_analyzed_node)
>     {
> -      lto_output_sleb128_stream (ob->main_stream,
> -                                node->local.inline_summary.estimated_self_stack_size);
> -      lto_output_sleb128_stream (ob->main_stream,
> -                                node->local.inline_summary.self_size);
> -      lto_output_sleb128_stream (ob->main_stream,
> -                                node->local.inline_summary.size_inlining_benefit);
> -      lto_output_sleb128_stream (ob->main_stream,
> -                                node->local.inline_summary.self_time);
> -      lto_output_sleb128_stream (ob->main_stream,
> -                                node->local.inline_summary.time_inlining_benefit);
>       if (node->global.inlined_to)
>        {
>          ref = lto_cgraph_encoder_lookup (encoder, node->global.inlined_to);
> @@ -930,23 +920,9 @@ input_overwrite_node (struct lto_file_de
>                      struct cgraph_node *node,
>                      enum LTO_cgraph_tags tag,
>                      struct bitpack_d *bp,
> -                     unsigned int stack_size,
> -                     unsigned int self_time,
> -                     unsigned int time_inlining_benefit,
> -                     unsigned int self_size,
> -                     unsigned int size_inlining_benefit,
>                      enum ld_plugin_symbol_resolution resolution)
>  {
>   node->aux = (void *) tag;
> -  node->local.inline_summary.estimated_self_stack_size = stack_size;
> -  node->local.inline_summary.self_time = self_time;
> -  node->local.inline_summary.time_inlining_benefit = time_inlining_benefit;
> -  node->local.inline_summary.self_size = self_size;
> -  node->local.inline_summary.size_inlining_benefit = size_inlining_benefit;
> -  node->global.time = self_time;
> -  node->global.size = self_size;
> -  node->global.estimated_stack_size = stack_size;
> -  node->global.estimated_growth = INT_MIN;
>   node->local.lto_file_data = file_data;
>
>   node->local.local = bp_unpack_value (bp, 1);
> @@ -1023,13 +999,8 @@ input_node (struct lto_file_decl_data *f
>   tree fn_decl;
>   struct cgraph_node *node;
>   struct bitpack_d bp;
> -  int stack_size = 0;
>   unsigned decl_index;
>   int ref = LCC_NOT_FOUND, ref2 = LCC_NOT_FOUND;
> -  int self_time = 0;
> -  int self_size = 0;
> -  int time_inlining_benefit = 0;
> -  int size_inlining_benefit = 0;
>   unsigned long same_body_count = 0;
>   int clone_ref;
>   enum ld_plugin_symbol_resolution resolution;
> @@ -1051,15 +1022,7 @@ input_node (struct lto_file_decl_data *f
>   node->count_materialization_scale = lto_input_sleb128 (ib);
>
>   if (tag == LTO_cgraph_analyzed_node)
> -    {
> -      stack_size = lto_input_sleb128 (ib);
> -      self_size = lto_input_sleb128 (ib);
> -      size_inlining_benefit = lto_input_sleb128 (ib);
> -      self_time = lto_input_sleb128 (ib);
> -      time_inlining_benefit = lto_input_sleb128 (ib);
> -
> -      ref = lto_input_sleb128 (ib);
> -    }
> +    ref = lto_input_sleb128 (ib);
>
>   ref2 = lto_input_sleb128 (ib);
>
> @@ -1073,9 +1036,7 @@ input_node (struct lto_file_decl_data *f
>
>   bp = lto_input_bitpack (ib);
>   resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
> -  input_overwrite_node (file_data, node, tag, &bp, stack_size, self_time,
> -                       time_inlining_benefit, self_size,
> -                       size_inlining_benefit, resolution);
> +  input_overwrite_node (file_data, node, tag, &bp, resolution);
>
>   /* Store a reference for now, and fix up later to be a pointer.  */
>   node->global.inlined_to = (cgraph_node_ptr) (intptr_t) ref;
> Index: ipa-inline.c
> ===================================================================
> --- ipa-inline.c        (revision 172396)
> +++ ipa-inline.c        (working copy)
> @@ -1301,6 +1301,9 @@ cgraph_decide_inlining (void)
>              max_benefit = benefit;
>          }
>       }
> +
> +  if (dump_file)
> +    dump_inline_summaries (dump_file);
>   gcc_assert (in_lto_p
>              || !max_count
>              || (profile_info && flag_branch_probabilities));
> @@ -1558,8 +1561,7 @@ cgraph_decide_inlining_incrementally (st
>       /* When the function body would grow and inlining the function
>         won't eliminate the need for offline copy of the function,
>         don't inline.  */
> -      if (estimate_edge_growth (e) > allowed_growth
> -         && estimate_growth (e->callee) > allowed_growth)
> +      if (estimate_edge_growth (e) > allowed_growth)
>        {
>          if (dump_file)
>            fprintf (dump_file,
> @@ -1601,6 +1603,7 @@ static unsigned int
>  cgraph_early_inlining (void)
>  {
>   struct cgraph_node *node = cgraph_get_node (current_function_decl);
> +  struct cgraph_edge *edge;
>   unsigned int todo = 0;
>   int iterations = 0;
>   bool inlined = false;
> @@ -1652,6 +1655,19 @@ cgraph_early_inlining (void)
>     {
>       timevar_push (TV_INTEGRATION);
>       todo |= optimize_inline_calls (current_function_decl);
> +
> +      /* Technically we ought to recompute inline parameters so the new iteration of
> +        early inliner works as expected.  We however have values approximately right
> +        and thus we only need to update edge info that might be cleared out for
> +        newly discovered edges.  */
> +      for (edge = node->callees; edge; edge = edge->next_callee)
> +       {
> +         edge->call_stmt_size
> +           = estimate_num_insns (edge->call_stmt, &eni_size_weights);
> +         edge->call_stmt_time
> +           = estimate_num_insns (edge->call_stmt, &eni_time_weights);
> +       }
> +
>       timevar_pop (TV_INTEGRATION);
>     }
>
> Index: ipa-inline.h
> ===================================================================
> --- ipa-inline.h        (revision 172396)
> +++ ipa-inline.h        (working copy)
> @@ -19,6 +19,30 @@ You should have received a copy of the G
>  along with GCC; see the file COPYING3.  If not see
>  <http://www.gnu.org/licenses/>.  */
>
> +/* Function inlining information.  */
> +
> +struct inline_summary
> +{
> +  /* Estimated stack frame consumption by the function.  */
> +  HOST_WIDE_INT estimated_self_stack_size;
> +
> +  /* Size of the function body.  */
> +  int self_size;
> +  /* How many instructions are likely going to disappear after inlining.  */
> +  int size_inlining_benefit;
> +  /* Estimated time spent executing the function body.  */
> +  int self_time;
> +  /* How much time is going to be saved by inlining.  */
> +  int time_inlining_benefit;
> +};
> +
> +typedef struct inline_summary inline_summary_t;
> +DEF_VEC_O(inline_summary_t);
> +DEF_VEC_ALLOC_O(inline_summary_t,heap);
> +extern VEC(inline_summary_t,heap) *inline_summary_vec;
> +
> +void debug_inline_summary (struct cgraph_node *);
> +void dump_inline_summaries (FILE *f);
>  void inline_generate_summary (void);
>  void inline_read_summary (void);
>  void inline_write_summary (cgraph_node_set, varpool_node_set);
> @@ -30,7 +54,7 @@ int estimate_growth (struct cgraph_node
>  static inline struct inline_summary *
>  inline_summary (struct cgraph_node *node)
>  {
> -  return &node->local.inline_summary;
> +  return VEC_index (inline_summary_t, inline_summary_vec, node->uid);
>  }
>
>  /* Estimate the growth of the caller when inlining EDGE.  */
> @@ -39,12 +63,8 @@ static inline int
>  estimate_edge_growth (struct cgraph_edge *edge)
>  {
>   int call_stmt_size;
> -  /* ???  We throw away cgraph edges all the time so the information
> -     we store in edges doesn't persist for early inlining.  Ugh.  */
> -  if (!edge->call_stmt)
> -    call_stmt_size = edge->call_stmt_size;
> -  else
> -    call_stmt_size = estimate_num_insns (edge->call_stmt, &eni_size_weights);
> +  call_stmt_size = edge->call_stmt_size;
> +  gcc_checking_assert (call_stmt_size);
>   return (edge->callee->global.size
>          - inline_summary (edge->callee)->size_inlining_benefit
>          - call_stmt_size);
> Index: lto-section-in.c
> ===================================================================
> --- lto-section-in.c    (revision 172396)
> +++ lto-section-in.c    (working copy)
> @@ -58,7 +58,8 @@ const char *lto_section_name[LTO_N_SECTI
>   "reference",
>   "symtab",
>   "opts",
> -  "cgraphopt"
> +  "cgraphopt",
> +  "inline"
>  };
>
>  unsigned char
> Index: ipa.c
> ===================================================================
> --- ipa.c       (revision 172396)
> +++ ipa.c       (working copy)
> @@ -517,6 +517,8 @@ cgraph_remove_unreachable_nodes (bool be
>              }
>          }
>       }
> +  if (file)
> +    fprintf (file, "\n");
>
>  #ifdef ENABLE_CHECKING
>   verify_cgraph ();
> Index: ipa-inline-analysis.c
> ===================================================================
> --- ipa-inline-analysis.c       (revision 172396)
> +++ ipa-inline-analysis.c       (working copy)
> @@ -23,13 +23,13 @@ along with GCC; see the file COPYING3.
>
>    We estimate for each function
>      - function body size
> -     - function runtime
> +     - average function execution time
>      - inlining size benefit (that is how much of function body size
>        and its call sequence is expected to disappear by inlining)
>      - inlining time benefit
>      - function frame size
>    For each call
> -     - call sequence size
> +     - call statement size and time
>
>    inlinie_summary datastructures store above information locally (i.e.
>    parameters of the function itself) and globally (i.e. parameters of
> @@ -61,12 +61,100 @@ along with GCC; see the file COPYING3.
>  #include "ggc.h"
>  #include "tree-flow.h"
>  #include "ipa-prop.h"
> +#include "lto-streamer.h"
>  #include "ipa-inline.h"
>
>  #define MAX_TIME 1000000000
>
>  /* Holders of ipa cgraph hooks: */
>  static struct cgraph_node_hook_list *function_insertion_hook_holder;
> +static struct cgraph_node_hook_list *node_removal_hook_holder;
> +static struct cgraph_2node_hook_list *node_duplication_hook_holder;
> +static void inline_node_removal_hook (struct cgraph_node *, void *);
> +static void inline_node_duplication_hook (struct cgraph_node *,
> +                                         struct cgraph_node *, void *);
> +
> +/* VECtor holding inline summaries.  */
> +VEC(inline_summary_t,heap) *inline_summary_vec;
> +
> +/* Allocate the inline summary vector or resize it to cover all cgraph nodes. */
> +
> +static void
> +inline_summary_alloc (void)
> +{
> +  if (!node_removal_hook_holder)
> +    node_removal_hook_holder =
> +      cgraph_add_node_removal_hook (&inline_node_removal_hook, NULL);
> +  if (!node_duplication_hook_holder)
> +    node_duplication_hook_holder =
> +      cgraph_add_node_duplication_hook (&inline_node_duplication_hook, NULL);
> +
> +  if (VEC_length (inline_summary_t, inline_summary_vec)
> +      <= (unsigned) cgraph_max_uid)
> +    VEC_safe_grow_cleared (inline_summary_t, heap,
> +                          inline_summary_vec, cgraph_max_uid + 1);
> +}
> +
> +/* Hook that is called by cgraph.c when a node is removed.  */
> +
> +static void
> +inline_node_removal_hook (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
> +{
> +  /* During IPA-CP updating we can be called on not-yet analyze clones.  */
> +  if (VEC_length (inline_summary_t, inline_summary_vec)
> +      <= (unsigned)node->uid)
> +    return;
> +  memset (inline_summary (node),
> +         0, sizeof (inline_summary_t));
> +}
> +
> +/* Hook that is called by cgraph.c when a node is duplicated.  */
> +
> +static void
> +inline_node_duplication_hook (struct cgraph_node *src, struct cgraph_node *dst,
> +                             ATTRIBUTE_UNUSED void *data)
> +{
> +  inline_summary_alloc ();
> +  memcpy (inline_summary (dst), inline_summary (src),
> +         sizeof (struct inline_summary));
> +}
> +
> +static void
> +dump_inline_summary (FILE *f, struct cgraph_node *node)
> +{
> +  if (node->analyzed)
> +    {
> +      struct inline_summary *s = inline_summary (node);
> +      fprintf (f, "Inline summary for %s/%i\n", cgraph_node_name (node),
> +              node->uid);
> +      fprintf (f, "  self time:       %i, benefit: %i\n",
> +              s->self_time, s->time_inlining_benefit);
> +      fprintf (f, "  global time:     %i\n", node->global.time);
> +      fprintf (f, "  self size:       %i, benefit: %i\n",
> +              s->self_size, s->size_inlining_benefit);
> +      fprintf (f, "  global size:     %i", node->global.size);
> +      fprintf (f, "  self stack:      %i\n",
> +              (int)s->estimated_self_stack_size);
> +      fprintf (f, "  global stack:    %i\n",
> +              (int)node->global.estimated_stack_size);
> +    }
> +}
> +
> +void
> +debug_inline_summary (struct cgraph_node *node)
> +{
> +  dump_inline_summary (stderr, node);
> +}
> +
> +void
> +dump_inline_summaries (FILE *f)
> +{
> +  struct cgraph_node *node;
> +
> +  for (node = cgraph_nodes; node; node = node->next)
> +    if (node->analyzed)
> +      dump_inline_summary (f, node);
> +}
>
>  /* See if statement might disappear after inlining.
>    0 - means not eliminated
> @@ -179,16 +267,27 @@ estimate_function_body_sizes (struct cgr
>                       freq, this_size, this_time);
>              print_gimple_stmt (dump_file, stmt, 0, 0);
>            }
> +
> +         if (is_gimple_call (stmt))
> +           {
> +             struct cgraph_edge *edge = cgraph_edge (node, stmt);
> +             edge->call_stmt_size = this_size;
> +             edge->call_stmt_time = this_time;
> +           }
> +
>          this_time *= freq;
>          time += this_time;
>          size += this_size;
> +
>          prob = eliminated_by_inlining_prob (stmt);
>          if (prob == 1 && dump_file && (dump_flags & TDF_DETAILS))
>            fprintf (dump_file, "    50%% will be eliminated by inlining\n");
>          if (prob == 2 && dump_file && (dump_flags & TDF_DETAILS))
>            fprintf (dump_file, "    will eliminated by inlining\n");
> +
>          size_inlining_benefit += this_size * prob;
>          time_inlining_benefit += this_time * prob;
> +
>          gcc_assert (time >= 0);
>          gcc_assert (size >= 0);
>        }
> @@ -222,6 +321,8 @@ compute_inline_parameters (struct cgraph
>
>   gcc_assert (!node->global.inlined_to);
>
> +  inline_summary_alloc ();
> +
>   /* Estimate the stack size for the function if we're optimizing.  */
>   self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
>   inline_summary (node)->estimated_self_stack_size = self_stack_size;
> @@ -247,17 +348,7 @@ compute_inline_parameters (struct cgraph
>       node->local.can_change_signature = !e;
>     }
>   estimate_function_body_sizes (node);
> -  /* Compute size of call statements.  We have to do this for callers here,
> -     those sizes need to be present for edges _to_ us as early as
> -     we are finished with early opts.  */
> -  for (e = node->callers; e; e = e->next_caller)
> -    if (e->call_stmt)
> -      {
> -       e->call_stmt_size
> -         = estimate_num_insns (e->call_stmt, &eni_size_weights);
> -       e->call_stmt_time
> -         = estimate_num_insns (e->call_stmt, &eni_time_weights);
> -      }
> +
>   /* Inlining characteristics are maintained by the cgraph_mark_inline.  */
>   node->global.time = inline_summary (node)->self_time;
>   node->global.size = inline_summary (node)->self_size;
> @@ -300,12 +391,8 @@ static inline int
>  estimate_edge_time (struct cgraph_edge *edge)
>  {
>   int call_stmt_time;
> -  /* ???  We throw away cgraph edges all the time so the information
> -     we store in edges doesn't persist for early inlining.  Ugh.  */
> -  if (!edge->call_stmt)
> -    call_stmt_time = edge->call_stmt_time;
> -  else
> -    call_stmt_time = estimate_num_insns (edge->call_stmt, &eni_time_weights);
> +  call_stmt_time = edge->call_stmt_time;
> +  gcc_checking_assert (call_stmt_time);
>   return (((gcov_type)edge->callee->global.time
>           - inline_summary (edge->callee)->time_inlining_benefit
>           - call_stmt_time) * edge->frequency
> @@ -379,8 +466,10 @@ estimate_growth (struct cgraph_node *nod
>   return growth;
>  }
>
> +
>  /* This function performs intraprocedural analysis in NODE that is required to
>    inline indirect calls.  */
> +
>  static void
>  inline_indirect_intraprocedural_analysis (struct cgraph_node *node)
>  {
> @@ -437,8 +526,6 @@ inline_generate_summary (void)
>   for (node = cgraph_nodes; node; node = node->next)
>     if (node->analyzed)
>       inline_analyze_function (node);
> -
> -  return;
>  }
>
>
> @@ -449,6 +536,57 @@ inline_generate_summary (void)
>  void
>  inline_read_summary (void)
>  {
> +  struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
> +  struct lto_file_decl_data *file_data;
> +  unsigned int j = 0;
> +
> +  inline_summary_alloc ();
> +
> +  while ((file_data = file_data_vec[j++]))
> +    {
> +      size_t len;
> +      const char *data = lto_get_section_data (file_data, LTO_section_inline_summary, NULL, &len);
> +
> +      struct lto_input_block *ib
> +       = lto_create_simple_input_block (file_data,
> +                                        LTO_section_inline_summary,
> +                                        &data, &len);
> +      if (ib)
> +       {
> +         unsigned int i;
> +         unsigned int f_count = lto_input_uleb128 (ib);
> +
> +         for (i = 0; i < f_count; i++)
> +           {
> +             unsigned int index;
> +             struct cgraph_node *node;
> +             struct inline_summary *info;
> +             lto_cgraph_encoder_t encoder;
> +
> +             index = lto_input_uleb128 (ib);
> +             encoder = file_data->cgraph_node_encoder;
> +             node = lto_cgraph_encoder_deref (encoder, index);
> +             info = inline_summary (node);
> +
> +             node->global.estimated_stack_size
> +               = info->estimated_self_stack_size = lto_input_uleb128 (ib);
> +             node->global.time = info->self_time = lto_input_uleb128 (ib);
> +             info->time_inlining_benefit = lto_input_uleb128 (ib);
> +             node->global.size = info->self_size = lto_input_uleb128 (ib);
> +             info->size_inlining_benefit = lto_input_uleb128 (ib);
> +             node->global.estimated_growth = INT_MIN;
> +           }
> +
> +         lto_destroy_simple_input_block (file_data,
> +                                         LTO_section_inline_summary,
> +                                         ib, data, len);
> +       }
> +      else
> +       /* Fatal error here.  We do not want to support compiling ltrans units with
> +          different version of compiler or different flags than the WPA unit, so
> +          this should never happen.  */
> +       fatal_error ("ipa reference summary is missing in ltrans unit");
> +    }
>   if (flag_indirect_inlining)
>     {
>       ipa_register_cgraph_hooks ();
> @@ -468,14 +606,57 @@ void
>  inline_write_summary (cgraph_node_set set,
>                      varpool_node_set vset ATTRIBUTE_UNUSED)
>  {
> +  struct cgraph_node *node;
> +  struct lto_simple_output_block *ob
> +    = lto_create_simple_output_block (LTO_section_inline_summary);
> +  lto_cgraph_encoder_t encoder = ob->decl_state->cgraph_node_encoder;
> +  unsigned int count = 0;
> +  int i;
> +
> +  for (i = 0; i < lto_cgraph_encoder_size (encoder); i++)
> +    if (lto_cgraph_encoder_deref (encoder, i)->analyzed)
> +      count++;
> +  lto_output_uleb128_stream (ob->main_stream, count);
> +
> +  for (i = 0; i < lto_cgraph_encoder_size (encoder); i++)
> +    {
> +      node = lto_cgraph_encoder_deref (encoder, i);
> +      if (node->analyzed)
> +       {
> +         struct inline_summary *info = inline_summary (node);
> +         lto_output_uleb128_stream (ob->main_stream,
> +                                    lto_cgraph_encoder_encode (encoder, node));
> +         lto_output_sleb128_stream (ob->main_stream,
> +                                    info->estimated_self_stack_size);
> +         lto_output_sleb128_stream (ob->main_stream,
> +                                    info->self_size);
> +         lto_output_sleb128_stream (ob->main_stream,
> +                                    info->size_inlining_benefit);
> +         lto_output_sleb128_stream (ob->main_stream,
> +                                    info->self_time);
> +         lto_output_sleb128_stream (ob->main_stream,
> +                                    info->time_inlining_benefit);
> +       }
> +    }
> +
>   if (flag_indirect_inlining && !flag_ipa_cp)
>     ipa_prop_write_jump_functions (set);
>  }
>
> +
>  /* Release inline summary.  */
>
>  void
>  inline_free_summary (void)
>  {
> -  cgraph_remove_function_insertion_hook (function_insertion_hook_holder);
> +  if (function_insertion_hook_holder)
> +    cgraph_remove_function_insertion_hook (function_insertion_hook_holder);
> +  function_insertion_hook_holder = NULL;
> +  if (node_removal_hook_holder)
> +    cgraph_remove_node_removal_hook (node_removal_hook_holder);
> +  node_removal_hook_holder = NULL;
> +  if (node_duplication_hook_holder)
> +    cgraph_remove_node_duplication_hook (node_duplication_hook_holder);
> +  node_duplication_hook_holder = NULL;
> +  VEC_free (inline_summary_t, heap, inline_summary_vec);
>  }
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c   (revision 172396)
> +++ lto/lto.c   (working copy)
> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
>  #include "lto-streamer.h"
>  #include "splay-tree.h"
>  #include "params.h"
> +#include "ipa-inline.h"
>
>  static GTY(()) tree first_personality_decl;
>
> @@ -750,7 +751,7 @@ add_cgraph_node_to_partition (ltrans_par
>  {
>   struct cgraph_edge *e;
>
> -  part->insns += node->local.inline_summary.self_size;
> +  part->insns += inline_summary (node)->self_size;
>
>   if (node->aux)
>     {
> @@ -811,7 +812,7 @@ undo_partition (ltrans_partition partiti
>       struct cgraph_node *node = VEC_index (cgraph_node_ptr,
>                                            partition->cgraph_set->nodes,
>                                            n_cgraph_nodes);
> -      partition->insns -= node->local.inline_summary.self_size;
> +      partition->insns -= inline_summary (node)->self_size;
>       cgraph_node_set_remove (partition->cgraph_set, node);
>       node->aux = (void *)((size_t)node->aux - 1);
>     }
> Index: lto/Make-lang.in
> ===================================================================
> --- lto/Make-lang.in    (revision 172396)
> +++ lto/Make-lang.in    (working copy)
> @@ -85,7 +85,8 @@ lto/lto.o: lto/lto.c $(CONFIG_H) $(SYSTE
>        $(CGRAPH_H) $(GGC_H) tree-ssa-operands.h $(TREE_PASS_H) \
>        langhooks.h $(VEC_H) $(BITMAP_H) pointer-set.h $(IPA_PROP_H) \
>        $(COMMON_H) debug.h $(TIMEVAR_H) $(GIMPLE_H) $(LTO_H) $(LTO_TREE_H) \
> -       $(LTO_TAGS_H) $(LTO_STREAMER_H) $(SPLAY_TREE_H) gt-lto-lto.h $(PARAMS_H)
> +       $(LTO_TAGS_H) $(LTO_STREAMER_H) $(SPLAY_TREE_H) gt-lto-lto.h $(PARAMS_H) \
> +       ipa-inline.h
>  lto/lto-object.o: lto/lto-object.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
>        $(DIAGNOSTIC_CORE_H) $(LTO_H) $(TM_H) $(LTO_STREAMER_H) \
>        ../include/simple-object.h
> Index: ipa-prop.c
> ===================================================================
> --- ipa-prop.c  (revision 172396)
> +++ ipa-prop.c  (working copy)
> @@ -1998,7 +1998,7 @@ ipa_edge_duplication_hook (struct cgraph
>
>  static void
>  ipa_node_duplication_hook (struct cgraph_node *src, struct cgraph_node *dst,
> -                          __attribute__((unused)) void *data)
> +                          ATTRIBUTE_UNUSED void *data)
>  {
>   struct ipa_node_params *old_info, *new_info;
>   int param_count, i;
> Index: Makefile.in
> ===================================================================
> --- Makefile.in (revision 172396)
> +++ Makefile.in (working copy)
> @@ -3011,7 +3011,7 @@ ipa-ref.o : ipa-ref.c $(CONFIG_H) $(SYST
>  ipa-cp.o : ipa-cp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h  \
>    $(TREE_H) $(TARGET_H) $(GIMPLE_H) $(CGRAPH_H) $(IPA_PROP_H) $(TREE_FLOW_H) \
>    $(TREE_PASS_H) $(FLAGS_H) $(TIMEVAR_H) $(DIAGNOSTIC_H) $(TREE_DUMP_H) \
> -   $(TREE_INLINE_H) $(FIBHEAP_H) $(PARAMS_H) tree-pretty-print.h
> +   $(TREE_INLINE_H) $(FIBHEAP_H) $(PARAMS_H) tree-pretty-print.h ipa-inline.h
>  ipa-split.o : ipa-split.c $(CONFIG_H) $(SYSTEM_H) coretypes.h  \
>    $(TREE_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) $(TREE_FLOW_H) \
>    $(TREE_PASS_H) $(FLAGS_H) $(TIMEVAR_H) $(DIAGNOSTIC_H) $(TREE_DUMP_H) \
> @@ -3032,7 +3032,7 @@ ipa-inline-analysis.o : ipa-inline-analy
>    $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>    $(DIAGNOSTIC_H) $(PARAMS_H) $(TIMEVAR_H) $(TREE_PASS_H) \
>    $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(IPA_PROP_H) \
> -   gimple-pretty-print.h ipa-inline.h
> +   gimple-pretty-print.h ipa-inline.h $(LTO_STREAMER_H)
>  ipa-utils.o : ipa-utils.c $(IPA_UTILS_H) $(CONFIG_H) $(SYSTEM_H) \
>    coretypes.h $(TM_H) $(TREE_H) $(TREE_FLOW_H) $(TREE_INLINE_H) langhooks.h \
>    pointer-set.h $(GGC_H) $(GIMPLE_H) $(SPLAY_TREE_H) \
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h      (revision 172396)
> +++ lto-streamer.h      (working copy)
> @@ -264,6 +264,7 @@ enum lto_section_type
>   LTO_section_symtab,
>   LTO_section_opts,
>   LTO_section_cgraph_opt_sum,
> +  LTO_section_inline_summary,
>   LTO_N_SECTION_TYPES          /* Must be last.  */
>  };
>
>
Jan Hubicka April 15, 2011, 9:29 a.m. UTC | #2
> >
> > I fixed this on the and added sanity check that the fields are initialized.
> > This has shown problem with early inliner iteration fixed thusly and fact that
> > early inliner is attempting to compute overall growth at a time the inline
> > parameters are not computed for functions not visited by early optimizations
> > yet. We previously agreed that early inliner should not try to do that (as this
> > leads to early inliner inlining functions called once that should be deferred
> > for later consieration).  I just hope it won't cause benchmarks to
> > regress too much ;)
> 
> Yeah, we agreed to that.  And I forgot about it as it wasn't part of the
> early inliner reorg (which was supposed to be a 1:1 transform).

Today C++ results shows some regressions, but nothing earthshaking.  So I think it is good
idea to drop this feature of early inliner since it is not really systematic.
There is also great improvement on LTO SPEC2000, but I tend to hope it is unrelated change.
Perhaps your aliasing?

Honza
Richard Biener April 15, 2011, 10:09 a.m. UTC | #3
2011/4/15 Jan Hubicka <hubicka@ucw.cz>:
>> >
>> > I fixed this on the and added sanity check that the fields are initialized.
>> > This has shown problem with early inliner iteration fixed thusly and fact that
>> > early inliner is attempting to compute overall growth at a time the inline
>> > parameters are not computed for functions not visited by early optimizations
>> > yet. We previously agreed that early inliner should not try to do that (as this
>> > leads to early inliner inlining functions called once that should be deferred
>> > for later consieration).  I just hope it won't cause benchmarks to
>> > regress too much ;)
>>
>> Yeah, we agreed to that.  And I forgot about it as it wasn't part of the
>> early inliner reorg (which was supposed to be a 1:1 transform).
>
> Today C++ results shows some regressions, but nothing earthshaking.  So I think it is good
> idea to drop this feature of early inliner since it is not really systematic.
> There is also great improvement on LTO SPEC2000, but I tend to hope it is unrelated change.
> Perhaps your aliasing?

I doubt SPEC2k uses VLAs or alloca, does it?  Might be the DSE
improvements, but I'm not sure.

Richard.

> Honza
>
Jan Hubicka April 15, 2011, 11:23 a.m. UTC | #4
> 2011/4/15 Jan Hubicka <hubicka@ucw.cz>:
> >> >
> >> > I fixed this on the and added sanity check that the fields are initialized.
> >> > This has shown problem with early inliner iteration fixed thusly and fact that
> >> > early inliner is attempting to compute overall growth at a time the inline
> >> > parameters are not computed for functions not visited by early optimizations
> >> > yet. We previously agreed that early inliner should not try to do that (as this
> >> > leads to early inliner inlining functions called once that should be deferred
> >> > for later consieration).  I just hope it won't cause benchmarks to
> >> > regress too much ;)
> >>
> >> Yeah, we agreed to that.  And I forgot about it as it wasn't part of the
> >> early inliner reorg (which was supposed to be a 1:1 transform).
> >
> > Today C++ results shows some regressions, but nothing earthshaking.  So I think it is good
> > idea to drop this feature of early inliner since it is not really systematic.
> > There is also great improvement on LTO SPEC2000, but I tend to hope it is unrelated change.
> > Perhaps your aliasing?
> 
> I doubt SPEC2k uses VLAs or alloca, does it?  Might be the DSE
> improvements, but I'm not sure.

It seems to happen only with LTO, so it might be inlining & fixed call cost estimates. It does not
seem so likely to me however - I know that gzip is touchy about inlining, but vortex seems easy.

Honza
H.J. Lu April 16, 2011, 10:42 p.m. UTC | #5
On Wed, Apr 13, 2011 at 3:20 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch moves inline_summary from field in cgraph_node into its own on side
> datastructure. This moves it from arcane decision of mine to split all IPA data
> into global/local datas stored in common datastructure into the scheme we
> developed for new IPA passes some time ago.
>
> The advantage is that the code is more contained and less spread across the
> compiler. We also make cgraph_node smaller and dumps more compact that never
> hurts.
>
> While working on it I noticed that Richi's patch to introduce cgraph_edge
> times/sizes is bit iffy in computing data when they are missing in the
> datastructure. Also it computes incomming edge costs instead of outgoing that
> leads to fact that not all edges gets their info computed for IPA inliner
> (think of newly discovered direct calls or IPA merging).
>
> I fixed this on the and added sanity check that the fields are initialized.
> This has shown problem with early inliner iteration fixed thusly and fact that
> early inliner is attempting to compute overall growth at a time the inline
> parameters are not computed for functions not visited by early optimizations
> yet. We previously agreed that early inliner should not try to do that (as this
> leads to early inliner inlining functions called once that should be deferred
> for later consieration).  I just hope it won't cause benchmarks to
> regress too much ;)
>
> Having place to pile inline analysis info in, there is more to cleanup. The
> cgraph_local/cgraph_global fields probably should go and the stuff from global
> info should go into inline_summary datastructure, too (the lifetimes are
> essentially the same so there is no need for the split).  I will handle this
> incrementally.
>
> Bootstrapped/regtested x86_64-linux with slightly modified version of the patch.
> Re-testing with final version and intend to commit the patch tomorrow.
>
> Honza
>
>        * cgraph.c (dump_cgraph_node): Do not dump inline summaries.
>        * cgraph.h (struct inline_summary): Move to ipa-inline.h
>        (cgraph_local_info): Remove inline_summary.
>        * ipa-cp.c: Include ipa-inline.h.
>        (ipcp_cloning_candidate_p, ipcp_estimate_growth,
>        ipcp_estimate_cloning_cost, ipcp_insert_stage): Use inline_summary
>        accesor.
>        * lto-cgraph.c (lto_output_node): Do not stream inline summary.
>        (input_overwrite_node): Do not set inline summary.
>        (input_node): Do not stream inline summary.
>        * ipa-inline.c (cgraph_decide_inlining): Dump inline summaries.
>        (cgraph_decide_inlining_incrementally): Do not try to estimate overall
>        growth; we do not have inline parameters computed for that anyway.
>        (cgraph_early_inlining): After inlining compute call_stmt_sizes.
>        * ipa-inline.h (struct inline_summary): Move here from ipa-inline.h
>        (inline_summary_t): New type and VECtor.
>        (debug_inline_summary, dump_inline_summaries): Declare.
>        (inline_summary): Use VOCtor.
>        (estimate_edge_growth): Kill hack computing call stmt size directly.
>        * lto-section-in.c (lto_section_name): Add inline section.
>        * ipa-inline-analysis.c: Include lto-streamer.h
>        (node_removal_hook_holder, node_duplication_hook_holder): New holders
>        (inline_node_removal_hook, inline_node_duplication_hook): New functions.
>        (inline_summary_vec): Define.
>        (inline_summary_alloc, dump_inline_summary, debug_inline_summary,
>        dump_inline_summaries): New functions.
>        (estimate_function_body_sizes): Properly compute size/time of outgoing calls.
>        (compute_inline_parameters): Alloc inline_summary; do not compute size/time
>        of incomming calls.
>        (estimate_edge_time): Avoid missing time summary hack.
>        (inline_read_summary): Read inline summary info.
>        (inline_write_summary): Write inline summary info.
>        (inline_free_summary): Free all hooks and inline summary vector.
>        * lto-streamer.h: Add LTO_section_inline_summary section.
>        * Makefile.in (ipa-cp.o, ipa-inline-analysis.o): Update dependencies.
>        * ipa.c (cgraph_remove_unreachable_nodes): Fix dump file formating.
>
>        * lto.c: Include ipa-inline.h
>        (add_cgraph_node_to_partition, undo_partition): Use inline_summary accessor.
>        (ipa_node_duplication_hook): Fix declaration.
>        * Make-lang.in (lto.o): Update dependencies.

This may have caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48645
H.J. Lu April 16, 2011, 10:45 p.m. UTC | #6
On Sat, Apr 16, 2011 at 3:42 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Apr 13, 2011 at 3:20 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Hi,
>> this patch moves inline_summary from field in cgraph_node into its own on side
>> datastructure. This moves it from arcane decision of mine to split all IPA data
>> into global/local datas stored in common datastructure into the scheme we
>> developed for new IPA passes some time ago.
>>
>> The advantage is that the code is more contained and less spread across the
>> compiler. We also make cgraph_node smaller and dumps more compact that never
>> hurts.
>>
>> While working on it I noticed that Richi's patch to introduce cgraph_edge
>> times/sizes is bit iffy in computing data when they are missing in the
>> datastructure. Also it computes incomming edge costs instead of outgoing that
>> leads to fact that not all edges gets their info computed for IPA inliner
>> (think of newly discovered direct calls or IPA merging).
>>
>> I fixed this on the and added sanity check that the fields are initialized.
>> This has shown problem with early inliner iteration fixed thusly and fact that
>> early inliner is attempting to compute overall growth at a time the inline
>> parameters are not computed for functions not visited by early optimizations
>> yet. We previously agreed that early inliner should not try to do that (as this
>> leads to early inliner inlining functions called once that should be deferred
>> for later consieration).  I just hope it won't cause benchmarks to
>> regress too much ;)
>>
>> Having place to pile inline analysis info in, there is more to cleanup. The
>> cgraph_local/cgraph_global fields probably should go and the stuff from global
>> info should go into inline_summary datastructure, too (the lifetimes are
>> essentially the same so there is no need for the split).  I will handle this
>> incrementally.
>>
>> Bootstrapped/regtested x86_64-linux with slightly modified version of the patch.
>> Re-testing with final version and intend to commit the patch tomorrow.
>>
>> Honza
>>
>>        * cgraph.c (dump_cgraph_node): Do not dump inline summaries.
>>        * cgraph.h (struct inline_summary): Move to ipa-inline.h
>>        (cgraph_local_info): Remove inline_summary.
>>        * ipa-cp.c: Include ipa-inline.h.
>>        (ipcp_cloning_candidate_p, ipcp_estimate_growth,
>>        ipcp_estimate_cloning_cost, ipcp_insert_stage): Use inline_summary
>>        accesor.
>>        * lto-cgraph.c (lto_output_node): Do not stream inline summary.
>>        (input_overwrite_node): Do not set inline summary.
>>        (input_node): Do not stream inline summary.
>>        * ipa-inline.c (cgraph_decide_inlining): Dump inline summaries.
>>        (cgraph_decide_inlining_incrementally): Do not try to estimate overall
>>        growth; we do not have inline parameters computed for that anyway.
>>        (cgraph_early_inlining): After inlining compute call_stmt_sizes.
>>        * ipa-inline.h (struct inline_summary): Move here from ipa-inline.h
>>        (inline_summary_t): New type and VECtor.
>>        (debug_inline_summary, dump_inline_summaries): Declare.
>>        (inline_summary): Use VOCtor.
>>        (estimate_edge_growth): Kill hack computing call stmt size directly.
>>        * lto-section-in.c (lto_section_name): Add inline section.
>>        * ipa-inline-analysis.c: Include lto-streamer.h
>>        (node_removal_hook_holder, node_duplication_hook_holder): New holders
>>        (inline_node_removal_hook, inline_node_duplication_hook): New functions.
>>        (inline_summary_vec): Define.
>>        (inline_summary_alloc, dump_inline_summary, debug_inline_summary,
>>        dump_inline_summaries): New functions.
>>        (estimate_function_body_sizes): Properly compute size/time of outgoing calls.
>>        (compute_inline_parameters): Alloc inline_summary; do not compute size/time
>>        of incomming calls.
>>        (estimate_edge_time): Avoid missing time summary hack.
>>        (inline_read_summary): Read inline summary info.
>>        (inline_write_summary): Write inline summary info.
>>        (inline_free_summary): Free all hooks and inline summary vector.
>>        * lto-streamer.h: Add LTO_section_inline_summary section.
>>        * Makefile.in (ipa-cp.o, ipa-inline-analysis.o): Update dependencies.
>>        * ipa.c (cgraph_remove_unreachable_nodes): Fix dump file formating.
>>
>>        * lto.c: Include ipa-inline.h
>>        (add_cgraph_node_to_partition, undo_partition): Use inline_summary accessor.
>>        (ipa_node_duplication_hook): Fix declaration.
>>        * Make-lang.in (lto.o): Update dependencies.
>
> This may have caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48645
>

This may be the same as

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48622

which has a small testcase.
diff mbox

Patch

Index: cgraph.c
===================================================================
--- cgraph.c	(revision 172396)
+++ cgraph.c	(working copy)
@@ -1876,22 +1876,6 @@  dump_cgraph_node (FILE *f, struct cgraph
   if (node->count)
     fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
 	     (HOST_WIDEST_INT)node->count);
-  if (node->local.inline_summary.self_time)
-    fprintf (f, " %i time, %i benefit", node->local.inline_summary.self_time,
-    					node->local.inline_summary.time_inlining_benefit);
-  if (node->global.time && node->global.time
-      != node->local.inline_summary.self_time)
-    fprintf (f, " (%i after inlining)", node->global.time);
-  if (node->local.inline_summary.self_size)
-    fprintf (f, " %i size, %i benefit", node->local.inline_summary.self_size,
-    					node->local.inline_summary.size_inlining_benefit);
-  if (node->global.size && node->global.size
-      != node->local.inline_summary.self_size)
-    fprintf (f, " (%i after inlining)", node->global.size);
-  if (node->local.inline_summary.estimated_self_stack_size)
-    fprintf (f, " %i bytes stack usage", (int)node->local.inline_summary.estimated_self_stack_size);
-  if (node->global.estimated_stack_size != node->local.inline_summary.estimated_self_stack_size)
-    fprintf (f, " %i bytes after inlining", (int)node->global.estimated_stack_size);
   if (node->origin)
     fprintf (f, " nested in: %s", cgraph_node_name (node->origin));
   if (node->needed)
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 172396)
+++ cgraph.h	(working copy)
@@ -58,23 +58,6 @@  struct lto_file_decl_data;
 extern const char * const cgraph_availability_names[];
 extern const char * const ld_plugin_symbol_resolution_names[];
 
-/* Function inlining information.  */
-
-struct GTY(()) inline_summary
-{
-  /* Estimated stack frame consumption by the function.  */
-  HOST_WIDE_INT estimated_self_stack_size;
-
-  /* Size of the function body.  */
-  int self_size;
-  /* How many instructions are likely going to disappear after inlining.  */
-  int size_inlining_benefit;
-  /* Estimated time spent executing the function body.  */
-  int self_time;
-  /* How much time is going to be saved by inlining.  */
-  int time_inlining_benefit;
-};
-
 /* Information about thunk, used only for same body aliases.  */
 
 struct GTY(()) cgraph_thunk_info {
@@ -95,8 +78,6 @@  struct GTY(()) cgraph_local_info {
   /* File stream where this node is being written to.  */
   struct lto_file_decl_data * lto_file_data;
 
-  struct inline_summary inline_summary;
-
   /* Set when function function is visible in current compilation unit only
      and its address is never taken.  */
   unsigned local : 1;
Index: ipa-cp.c
===================================================================
--- ipa-cp.c	(revision 172396)
+++ ipa-cp.c	(working copy)
@@ -148,6 +148,7 @@  along with GCC; see the file COPYING3.  
 #include "tree-inline.h"
 #include "fibheap.h"
 #include "params.h"
+#include "ipa-inline.h"
 
 /* Number of functions identified as candidates for cloning. When not cloning
    we can simplify iterate stage not forcing it to go through the decision
@@ -495,7 +496,7 @@  ipcp_cloning_candidate_p (struct cgraph_
  	         cgraph_node_name (node));
       return false;
     }
-  if (node->local.inline_summary.self_size < n_calls)
+  if (inline_summary (node)->self_size < n_calls)
     {
       if (dump_file)
         fprintf (dump_file, "Considering %s for cloning; code would shrink.\n",
@@ -1189,7 +1190,7 @@  ipcp_estimate_growth (struct cgraph_node
      call site.  Precise cost is difficult to get, as our size metric counts
      constants and moves as free.  Generally we are looking for cases that
      small function is called very many times.  */
-  growth = node->local.inline_summary.self_size
+  growth = inline_summary (node)->self_size
   	   - removable_args * redirectable_node_callers;
   if (growth < 0)
     return 0;
@@ -1229,7 +1230,7 @@  ipcp_estimate_cloning_cost (struct cgrap
     cost /= freq_sum * 1000 / REG_BR_PROB_BASE + 1;
   if (dump_file)
     fprintf (dump_file, "Cost of versioning %s is %i, (size: %i, freq: %i)\n",
-             cgraph_node_name (node), cost, node->local.inline_summary.self_size,
+             cgraph_node_name (node), cost, inline_summary (node)->self_size,
 	     freq_sum);
   return cost + 1;
 }
@@ -1364,7 +1365,7 @@  ipcp_insert_stage (void)
       {
 	if (node->count > max_count)
 	  max_count = node->count;
-	overall_size += node->local.inline_summary.self_size;
+	overall_size += inline_summary (node)->self_size;
       }
 
   max_new_size = overall_size;
Index: lto-cgraph.c
===================================================================
--- lto-cgraph.c	(revision 172396)
+++ lto-cgraph.c	(working copy)
@@ -465,16 +465,6 @@  lto_output_node (struct lto_simple_outpu
 
   if (tag == LTO_cgraph_analyzed_node)
     {
-      lto_output_sleb128_stream (ob->main_stream,
-				 node->local.inline_summary.estimated_self_stack_size);
-      lto_output_sleb128_stream (ob->main_stream,
-				 node->local.inline_summary.self_size);
-      lto_output_sleb128_stream (ob->main_stream,
-				 node->local.inline_summary.size_inlining_benefit);
-      lto_output_sleb128_stream (ob->main_stream,
-				 node->local.inline_summary.self_time);
-      lto_output_sleb128_stream (ob->main_stream,
-				 node->local.inline_summary.time_inlining_benefit);
       if (node->global.inlined_to)
 	{
 	  ref = lto_cgraph_encoder_lookup (encoder, node->global.inlined_to);
@@ -930,23 +920,9 @@  input_overwrite_node (struct lto_file_de
 		      struct cgraph_node *node,
 		      enum LTO_cgraph_tags tag,
 		      struct bitpack_d *bp,
-		      unsigned int stack_size,
-		      unsigned int self_time,
-		      unsigned int time_inlining_benefit,
-		      unsigned int self_size,
-		      unsigned int size_inlining_benefit,
 		      enum ld_plugin_symbol_resolution resolution)
 {
   node->aux = (void *) tag;
-  node->local.inline_summary.estimated_self_stack_size = stack_size;
-  node->local.inline_summary.self_time = self_time;
-  node->local.inline_summary.time_inlining_benefit = time_inlining_benefit;
-  node->local.inline_summary.self_size = self_size;
-  node->local.inline_summary.size_inlining_benefit = size_inlining_benefit;
-  node->global.time = self_time;
-  node->global.size = self_size;
-  node->global.estimated_stack_size = stack_size;
-  node->global.estimated_growth = INT_MIN;
   node->local.lto_file_data = file_data;
 
   node->local.local = bp_unpack_value (bp, 1);
@@ -1023,13 +999,8 @@  input_node (struct lto_file_decl_data *f
   tree fn_decl;
   struct cgraph_node *node;
   struct bitpack_d bp;
-  int stack_size = 0;
   unsigned decl_index;
   int ref = LCC_NOT_FOUND, ref2 = LCC_NOT_FOUND;
-  int self_time = 0;
-  int self_size = 0;
-  int time_inlining_benefit = 0;
-  int size_inlining_benefit = 0;
   unsigned long same_body_count = 0;
   int clone_ref;
   enum ld_plugin_symbol_resolution resolution;
@@ -1051,15 +1022,7 @@  input_node (struct lto_file_decl_data *f
   node->count_materialization_scale = lto_input_sleb128 (ib);
 
   if (tag == LTO_cgraph_analyzed_node)
-    {
-      stack_size = lto_input_sleb128 (ib);
-      self_size = lto_input_sleb128 (ib);
-      size_inlining_benefit = lto_input_sleb128 (ib);
-      self_time = lto_input_sleb128 (ib);
-      time_inlining_benefit = lto_input_sleb128 (ib);
-
-      ref = lto_input_sleb128 (ib);
-    }
+    ref = lto_input_sleb128 (ib);
 
   ref2 = lto_input_sleb128 (ib);
 
@@ -1073,9 +1036,7 @@  input_node (struct lto_file_decl_data *f
 
   bp = lto_input_bitpack (ib);
   resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
-  input_overwrite_node (file_data, node, tag, &bp, stack_size, self_time,
-  			time_inlining_benefit, self_size,
-			size_inlining_benefit, resolution);
+  input_overwrite_node (file_data, node, tag, &bp, resolution);
 
   /* Store a reference for now, and fix up later to be a pointer.  */
   node->global.inlined_to = (cgraph_node_ptr) (intptr_t) ref;
Index: ipa-inline.c
===================================================================
--- ipa-inline.c	(revision 172396)
+++ ipa-inline.c	(working copy)
@@ -1301,6 +1301,9 @@  cgraph_decide_inlining (void)
 	      max_benefit = benefit;
 	  }
       }
+
+  if (dump_file)
+    dump_inline_summaries (dump_file);
   gcc_assert (in_lto_p
 	      || !max_count
 	      || (profile_info && flag_branch_probabilities));
@@ -1558,8 +1561,7 @@  cgraph_decide_inlining_incrementally (st
       /* When the function body would grow and inlining the function
 	 won't eliminate the need for offline copy of the function,
 	 don't inline.  */
-      if (estimate_edge_growth (e) > allowed_growth
-	  && estimate_growth (e->callee) > allowed_growth)
+      if (estimate_edge_growth (e) > allowed_growth)
 	{
 	  if (dump_file)
 	    fprintf (dump_file,
@@ -1601,6 +1603,7 @@  static unsigned int
 cgraph_early_inlining (void)
 {
   struct cgraph_node *node = cgraph_get_node (current_function_decl);
+  struct cgraph_edge *edge;
   unsigned int todo = 0;
   int iterations = 0;
   bool inlined = false;
@@ -1652,6 +1655,19 @@  cgraph_early_inlining (void)
     {
       timevar_push (TV_INTEGRATION);
       todo |= optimize_inline_calls (current_function_decl);
+
+      /* Technically we ought to recompute inline parameters so the new iteration of
+	 early inliner works as expected.  We however have values approximately right
+	 and thus we only need to update edge info that might be cleared out for
+	 newly discovered edges.  */
+      for (edge = node->callees; edge; edge = edge->next_callee)
+	{
+	  edge->call_stmt_size
+	    = estimate_num_insns (edge->call_stmt, &eni_size_weights);
+	  edge->call_stmt_time
+	    = estimate_num_insns (edge->call_stmt, &eni_time_weights);
+	}
+
       timevar_pop (TV_INTEGRATION);
     }
 
Index: ipa-inline.h
===================================================================
--- ipa-inline.h	(revision 172396)
+++ ipa-inline.h	(working copy)
@@ -19,6 +19,30 @@  You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+/* Function inlining information.  */
+
+struct inline_summary
+{
+  /* Estimated stack frame consumption by the function.  */
+  HOST_WIDE_INT estimated_self_stack_size;
+
+  /* Size of the function body.  */
+  int self_size;
+  /* How many instructions are likely going to disappear after inlining.  */
+  int size_inlining_benefit;
+  /* Estimated time spent executing the function body.  */
+  int self_time;
+  /* How much time is going to be saved by inlining.  */
+  int time_inlining_benefit;
+};
+
+typedef struct inline_summary inline_summary_t;
+DEF_VEC_O(inline_summary_t);
+DEF_VEC_ALLOC_O(inline_summary_t,heap);
+extern VEC(inline_summary_t,heap) *inline_summary_vec;
+
+void debug_inline_summary (struct cgraph_node *);
+void dump_inline_summaries (FILE *f);
 void inline_generate_summary (void);
 void inline_read_summary (void);
 void inline_write_summary (cgraph_node_set, varpool_node_set);
@@ -30,7 +54,7 @@  int estimate_growth (struct cgraph_node 
 static inline struct inline_summary *
 inline_summary (struct cgraph_node *node)
 {
-  return &node->local.inline_summary;
+  return VEC_index (inline_summary_t, inline_summary_vec, node->uid);
 }
 
 /* Estimate the growth of the caller when inlining EDGE.  */
@@ -39,12 +63,8 @@  static inline int
 estimate_edge_growth (struct cgraph_edge *edge)
 {
   int call_stmt_size;
-  /* ???  We throw away cgraph edges all the time so the information
-     we store in edges doesn't persist for early inlining.  Ugh.  */
-  if (!edge->call_stmt)
-    call_stmt_size = edge->call_stmt_size;
-  else
-    call_stmt_size = estimate_num_insns (edge->call_stmt, &eni_size_weights);
+  call_stmt_size = edge->call_stmt_size;
+  gcc_checking_assert (call_stmt_size);
   return (edge->callee->global.size
 	  - inline_summary (edge->callee)->size_inlining_benefit
 	  - call_stmt_size);
Index: lto-section-in.c
===================================================================
--- lto-section-in.c	(revision 172396)
+++ lto-section-in.c	(working copy)
@@ -58,7 +58,8 @@  const char *lto_section_name[LTO_N_SECTI
   "reference",
   "symtab",
   "opts",
-  "cgraphopt"
+  "cgraphopt",
+  "inline"
 };
 
 unsigned char
Index: ipa.c
===================================================================
--- ipa.c	(revision 172396)
+++ ipa.c	(working copy)
@@ -517,6 +517,8 @@  cgraph_remove_unreachable_nodes (bool be
 	      }
 	  }
       }
+  if (file)
+    fprintf (file, "\n");
 
 #ifdef ENABLE_CHECKING
   verify_cgraph ();
Index: ipa-inline-analysis.c
===================================================================
--- ipa-inline-analysis.c	(revision 172396)
+++ ipa-inline-analysis.c	(working copy)
@@ -23,13 +23,13 @@  along with GCC; see the file COPYING3.  
 
    We estimate for each function
      - function body size
-     - function runtime
+     - average function execution time
      - inlining size benefit (that is how much of function body size
        and its call sequence is expected to disappear by inlining)
      - inlining time benefit
      - function frame size
    For each call
-     - call sequence size
+     - call statement size and time
 
    inlinie_summary datastructures store above information locally (i.e.
    parameters of the function itself) and globally (i.e. parameters of
@@ -61,12 +61,100 @@  along with GCC; see the file COPYING3.  
 #include "ggc.h"
 #include "tree-flow.h"
 #include "ipa-prop.h"
+#include "lto-streamer.h"
 #include "ipa-inline.h"
 
 #define MAX_TIME 1000000000
 
 /* Holders of ipa cgraph hooks: */
 static struct cgraph_node_hook_list *function_insertion_hook_holder;
+static struct cgraph_node_hook_list *node_removal_hook_holder;
+static struct cgraph_2node_hook_list *node_duplication_hook_holder;
+static void inline_node_removal_hook (struct cgraph_node *, void *);
+static void inline_node_duplication_hook (struct cgraph_node *,
+					  struct cgraph_node *, void *);
+
+/* VECtor holding inline summaries.  */
+VEC(inline_summary_t,heap) *inline_summary_vec;
+
+/* Allocate the inline summary vector or resize it to cover all cgraph nodes. */
+
+static void
+inline_summary_alloc (void)
+{
+  if (!node_removal_hook_holder)
+    node_removal_hook_holder =
+      cgraph_add_node_removal_hook (&inline_node_removal_hook, NULL);
+  if (!node_duplication_hook_holder)
+    node_duplication_hook_holder =
+      cgraph_add_node_duplication_hook (&inline_node_duplication_hook, NULL);
+
+  if (VEC_length (inline_summary_t, inline_summary_vec)
+      <= (unsigned) cgraph_max_uid)
+    VEC_safe_grow_cleared (inline_summary_t, heap,
+			   inline_summary_vec, cgraph_max_uid + 1);
+}
+
+/* Hook that is called by cgraph.c when a node is removed.  */
+
+static void
+inline_node_removal_hook (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
+{
+  /* During IPA-CP updating we can be called on not-yet analyze clones.  */
+  if (VEC_length (inline_summary_t, inline_summary_vec)
+      <= (unsigned)node->uid)
+    return;
+  memset (inline_summary (node),
+	  0, sizeof (inline_summary_t));
+}
+
+/* Hook that is called by cgraph.c when a node is duplicated.  */
+
+static void
+inline_node_duplication_hook (struct cgraph_node *src, struct cgraph_node *dst,
+			      ATTRIBUTE_UNUSED void *data)
+{
+  inline_summary_alloc ();
+  memcpy (inline_summary (dst), inline_summary (src),
+	  sizeof (struct inline_summary));
+}
+
+static void
+dump_inline_summary (FILE *f, struct cgraph_node *node)
+{
+  if (node->analyzed)
+    {
+      struct inline_summary *s = inline_summary (node);
+      fprintf (f, "Inline summary for %s/%i\n", cgraph_node_name (node),
+	       node->uid);
+      fprintf (f, "  self time:       %i, benefit: %i\n",
+      	       s->self_time, s->time_inlining_benefit);
+      fprintf (f, "  global time:     %i\n", node->global.time);
+      fprintf (f, "  self size:       %i, benefit: %i\n",
+	       s->self_size, s->size_inlining_benefit);
+      fprintf (f, "  global size:     %i", node->global.size);
+      fprintf (f, "  self stack:      %i\n",
+	       (int)s->estimated_self_stack_size);
+      fprintf (f, "  global stack:    %i\n",
+	       (int)node->global.estimated_stack_size);
+    }
+}
+
+void
+debug_inline_summary (struct cgraph_node *node)
+{
+  dump_inline_summary (stderr, node);
+}
+
+void
+dump_inline_summaries (FILE *f)
+{
+  struct cgraph_node *node;
+
+  for (node = cgraph_nodes; node; node = node->next)
+    if (node->analyzed)
+      dump_inline_summary (f, node);
+}
 
 /* See if statement might disappear after inlining.
    0 - means not eliminated
@@ -179,16 +267,27 @@  estimate_function_body_sizes (struct cgr
 		       freq, this_size, this_time);
 	      print_gimple_stmt (dump_file, stmt, 0, 0);
 	    }
+
+	  if (is_gimple_call (stmt))
+	    {
+	      struct cgraph_edge *edge = cgraph_edge (node, stmt);
+	      edge->call_stmt_size = this_size;
+	      edge->call_stmt_time = this_time;
+	    }
+
 	  this_time *= freq;
 	  time += this_time;
 	  size += this_size;
+
 	  prob = eliminated_by_inlining_prob (stmt);
 	  if (prob == 1 && dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "    50%% will be eliminated by inlining\n");
 	  if (prob == 2 && dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "    will eliminated by inlining\n");
+
 	  size_inlining_benefit += this_size * prob;
 	  time_inlining_benefit += this_time * prob;
+
 	  gcc_assert (time >= 0);
 	  gcc_assert (size >= 0);
 	}
@@ -222,6 +321,8 @@  compute_inline_parameters (struct cgraph
 
   gcc_assert (!node->global.inlined_to);
 
+  inline_summary_alloc ();
+
   /* Estimate the stack size for the function if we're optimizing.  */
   self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
   inline_summary (node)->estimated_self_stack_size = self_stack_size;
@@ -247,17 +348,7 @@  compute_inline_parameters (struct cgraph
       node->local.can_change_signature = !e;
     }
   estimate_function_body_sizes (node);
-  /* Compute size of call statements.  We have to do this for callers here,
-     those sizes need to be present for edges _to_ us as early as
-     we are finished with early opts.  */
-  for (e = node->callers; e; e = e->next_caller)
-    if (e->call_stmt)
-      {
-	e->call_stmt_size
-	  = estimate_num_insns (e->call_stmt, &eni_size_weights);
-	e->call_stmt_time
-	  = estimate_num_insns (e->call_stmt, &eni_time_weights);
-      }
+
   /* Inlining characteristics are maintained by the cgraph_mark_inline.  */
   node->global.time = inline_summary (node)->self_time;
   node->global.size = inline_summary (node)->self_size;
@@ -300,12 +391,8 @@  static inline int
 estimate_edge_time (struct cgraph_edge *edge)
 {
   int call_stmt_time;
-  /* ???  We throw away cgraph edges all the time so the information
-     we store in edges doesn't persist for early inlining.  Ugh.  */
-  if (!edge->call_stmt)
-    call_stmt_time = edge->call_stmt_time;
-  else
-    call_stmt_time = estimate_num_insns (edge->call_stmt, &eni_time_weights);
+  call_stmt_time = edge->call_stmt_time;
+  gcc_checking_assert (call_stmt_time);
   return (((gcov_type)edge->callee->global.time
 	   - inline_summary (edge->callee)->time_inlining_benefit
 	   - call_stmt_time) * edge->frequency
@@ -379,8 +466,10 @@  estimate_growth (struct cgraph_node *nod
   return growth;
 }
 
+
 /* This function performs intraprocedural analysis in NODE that is required to
    inline indirect calls.  */
+
 static void
 inline_indirect_intraprocedural_analysis (struct cgraph_node *node)
 {
@@ -437,8 +526,6 @@  inline_generate_summary (void)
   for (node = cgraph_nodes; node; node = node->next)
     if (node->analyzed)
       inline_analyze_function (node);
-
-  return;
 }
 
 
@@ -449,6 +536,57 @@  inline_generate_summary (void)
 void
 inline_read_summary (void)
 {
+  struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
+  struct lto_file_decl_data *file_data;
+  unsigned int j = 0;
+
+  inline_summary_alloc ();
+
+  while ((file_data = file_data_vec[j++]))
+    {
+      size_t len;
+      const char *data = lto_get_section_data (file_data, LTO_section_inline_summary, NULL, &len);
+
+      struct lto_input_block *ib
+	= lto_create_simple_input_block (file_data,
+					 LTO_section_inline_summary,
+					 &data, &len);
+      if (ib)
+	{
+	  unsigned int i;
+	  unsigned int f_count = lto_input_uleb128 (ib);
+
+	  for (i = 0; i < f_count; i++)
+	    {
+	      unsigned int index;
+	      struct cgraph_node *node;
+	      struct inline_summary *info;
+	      lto_cgraph_encoder_t encoder;
+
+	      index = lto_input_uleb128 (ib);
+	      encoder = file_data->cgraph_node_encoder;
+	      node = lto_cgraph_encoder_deref (encoder, index);
+	      info = inline_summary (node);
+
+	      node->global.estimated_stack_size
+	        = info->estimated_self_stack_size = lto_input_uleb128 (ib);
+	      node->global.time = info->self_time = lto_input_uleb128 (ib);
+	      info->time_inlining_benefit = lto_input_uleb128 (ib);
+	      node->global.size = info->self_size = lto_input_uleb128 (ib);
+	      info->size_inlining_benefit = lto_input_uleb128 (ib);
+	      node->global.estimated_growth = INT_MIN;
+	    }
+
+	  lto_destroy_simple_input_block (file_data,
+					  LTO_section_inline_summary,
+					  ib, data, len);
+	}
+      else
+	/* Fatal error here.  We do not want to support compiling ltrans units with
+	   different version of compiler or different flags than the WPA unit, so
+	   this should never happen.  */
+	fatal_error ("ipa reference summary is missing in ltrans unit");
+    }
   if (flag_indirect_inlining)
     {
       ipa_register_cgraph_hooks ();
@@ -468,14 +606,57 @@  void
 inline_write_summary (cgraph_node_set set,
 		      varpool_node_set vset ATTRIBUTE_UNUSED)
 {
+  struct cgraph_node *node;
+  struct lto_simple_output_block *ob
+    = lto_create_simple_output_block (LTO_section_inline_summary);
+  lto_cgraph_encoder_t encoder = ob->decl_state->cgraph_node_encoder;
+  unsigned int count = 0;
+  int i;
+
+  for (i = 0; i < lto_cgraph_encoder_size (encoder); i++)
+    if (lto_cgraph_encoder_deref (encoder, i)->analyzed)
+      count++;
+  lto_output_uleb128_stream (ob->main_stream, count);
+
+  for (i = 0; i < lto_cgraph_encoder_size (encoder); i++)
+    {
+      node = lto_cgraph_encoder_deref (encoder, i);
+      if (node->analyzed)
+	{
+	  struct inline_summary *info = inline_summary (node);
+	  lto_output_uleb128_stream (ob->main_stream,
+				     lto_cgraph_encoder_encode (encoder, node));
+	  lto_output_sleb128_stream (ob->main_stream,
+				     info->estimated_self_stack_size);
+	  lto_output_sleb128_stream (ob->main_stream,
+				     info->self_size);
+	  lto_output_sleb128_stream (ob->main_stream,
+				     info->size_inlining_benefit);
+	  lto_output_sleb128_stream (ob->main_stream,
+				     info->self_time);
+	  lto_output_sleb128_stream (ob->main_stream,
+				     info->time_inlining_benefit);
+	}
+    }
+
   if (flag_indirect_inlining && !flag_ipa_cp)
     ipa_prop_write_jump_functions (set);
 }
 
+
 /* Release inline summary.  */
 
 void
 inline_free_summary (void)
 {
-  cgraph_remove_function_insertion_hook (function_insertion_hook_holder);
+  if (function_insertion_hook_holder)
+    cgraph_remove_function_insertion_hook (function_insertion_hook_holder);
+  function_insertion_hook_holder = NULL;
+  if (node_removal_hook_holder)
+    cgraph_remove_node_removal_hook (node_removal_hook_holder);
+  node_removal_hook_holder = NULL;
+  if (node_duplication_hook_holder)
+    cgraph_remove_node_duplication_hook (node_duplication_hook_holder);
+  node_duplication_hook_holder = NULL;
+  VEC_free (inline_summary_t, heap, inline_summary_vec);
 }
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 172396)
+++ lto/lto.c	(working copy)
@@ -44,6 +44,7 @@  along with GCC; see the file COPYING3.  
 #include "lto-streamer.h"
 #include "splay-tree.h"
 #include "params.h"
+#include "ipa-inline.h"
 
 static GTY(()) tree first_personality_decl;
 
@@ -750,7 +751,7 @@  add_cgraph_node_to_partition (ltrans_par
 {
   struct cgraph_edge *e;
 
-  part->insns += node->local.inline_summary.self_size;
+  part->insns += inline_summary (node)->self_size;
 
   if (node->aux)
     {
@@ -811,7 +812,7 @@  undo_partition (ltrans_partition partiti
       struct cgraph_node *node = VEC_index (cgraph_node_ptr,
 					    partition->cgraph_set->nodes,
 					    n_cgraph_nodes);
-      partition->insns -= node->local.inline_summary.self_size;
+      partition->insns -= inline_summary (node)->self_size;
       cgraph_node_set_remove (partition->cgraph_set, node);
       node->aux = (void *)((size_t)node->aux - 1);
     }
Index: lto/Make-lang.in
===================================================================
--- lto/Make-lang.in	(revision 172396)
+++ lto/Make-lang.in	(working copy)
@@ -85,7 +85,8 @@  lto/lto.o: lto/lto.c $(CONFIG_H) $(SYSTE
 	$(CGRAPH_H) $(GGC_H) tree-ssa-operands.h $(TREE_PASS_H) \
 	langhooks.h $(VEC_H) $(BITMAP_H) pointer-set.h $(IPA_PROP_H) \
 	$(COMMON_H) debug.h $(TIMEVAR_H) $(GIMPLE_H) $(LTO_H) $(LTO_TREE_H) \
-	$(LTO_TAGS_H) $(LTO_STREAMER_H) $(SPLAY_TREE_H) gt-lto-lto.h $(PARAMS_H)
+	$(LTO_TAGS_H) $(LTO_STREAMER_H) $(SPLAY_TREE_H) gt-lto-lto.h $(PARAMS_H) \
+	ipa-inline.h
 lto/lto-object.o: lto/lto-object.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
 	$(DIAGNOSTIC_CORE_H) $(LTO_H) $(TM_H) $(LTO_STREAMER_H) \
 	../include/simple-object.h
Index: ipa-prop.c
===================================================================
--- ipa-prop.c	(revision 172396)
+++ ipa-prop.c	(working copy)
@@ -1998,7 +1998,7 @@  ipa_edge_duplication_hook (struct cgraph
 
 static void
 ipa_node_duplication_hook (struct cgraph_node *src, struct cgraph_node *dst,
-			   __attribute__((unused)) void *data)
+			   ATTRIBUTE_UNUSED void *data)
 {
   struct ipa_node_params *old_info, *new_info;
   int param_count, i;
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 172396)
+++ Makefile.in	(working copy)
@@ -3011,7 +3011,7 @@  ipa-ref.o : ipa-ref.c $(CONFIG_H) $(SYST
 ipa-cp.o : ipa-cp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h  \
    $(TREE_H) $(TARGET_H) $(GIMPLE_H) $(CGRAPH_H) $(IPA_PROP_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(FLAGS_H) $(TIMEVAR_H) $(DIAGNOSTIC_H) $(TREE_DUMP_H) \
-   $(TREE_INLINE_H) $(FIBHEAP_H) $(PARAMS_H) tree-pretty-print.h
+   $(TREE_INLINE_H) $(FIBHEAP_H) $(PARAMS_H) tree-pretty-print.h ipa-inline.h
 ipa-split.o : ipa-split.c $(CONFIG_H) $(SYSTEM_H) coretypes.h  \
    $(TREE_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(FLAGS_H) $(TIMEVAR_H) $(DIAGNOSTIC_H) $(TREE_DUMP_H) \
@@ -3032,7 +3032,7 @@  ipa-inline-analysis.o : ipa-inline-analy
    $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
    $(DIAGNOSTIC_H) $(PARAMS_H) $(TIMEVAR_H) $(TREE_PASS_H) \
    $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(IPA_PROP_H) \
-   gimple-pretty-print.h ipa-inline.h
+   gimple-pretty-print.h ipa-inline.h $(LTO_STREAMER_H)
 ipa-utils.o : ipa-utils.c $(IPA_UTILS_H) $(CONFIG_H) $(SYSTEM_H) \
    coretypes.h $(TM_H) $(TREE_H) $(TREE_FLOW_H) $(TREE_INLINE_H) langhooks.h \
    pointer-set.h $(GGC_H) $(GIMPLE_H) $(SPLAY_TREE_H) \
Index: lto-streamer.h
===================================================================
--- lto-streamer.h	(revision 172396)
+++ lto-streamer.h	(working copy)
@@ -264,6 +264,7 @@  enum lto_section_type
   LTO_section_symtab,
   LTO_section_opts,
   LTO_section_cgraph_opt_sum,
+  LTO_section_inline_summary,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };