diff mbox

[RFC] LTO: IPA inline speed up for large apps (Chrome)

Message ID 54E376FC.9080709@suse.cz
State New
Headers show

Commit Message

Martin Liška Feb. 17, 2015, 5:14 p.m. UTC
Hello.

After LTO debugging of Chrome we noticed with Honza that WPA phase taken quite long time.
Following patch is an attempt to cache IPA inliner predicates that are constant during
inline_small functions.

As you can see in attached report, this patch can reduce time spent in WPA by ~40%, which
is really big improvement. Disadvantage of the solution is that the patch adds 4 new bitfields
to cgraph_node class. Well, we can move these flags to inline_summary, but as this struct is not
accessible from cgraph.h, we cannot benefit from inlining that is crucial for these predicates.

I welcome and ideas about the solution and I'm not sure if it's acceptable for STAGE4? That's reason
why no ChangeLog entry is prepared.

Thanks,
Martin
Hello.

Following mini patchset is speed-up for LTO WPA received on chromium binary:

Before:
Execution times (seconds)
 phase setup             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1977 kB ( 0%) ggc
 phase opt and generate  : 179.87 (66%) usr   1.67 (45%) sys 181.47 (66%) wall 2682287 kB (13%) ggc
 phase stream in         :  92.75 (34%) usr   2.05 (55%) sys  94.77 (34%) wall18738391 kB (87%) ggc
 callgraph optimization  :   0.71 ( 0%) usr   0.00 ( 0%) sys   0.71 ( 0%) wall      16 kB ( 0%) ggc
 ipa dead code removal   :   5.20 ( 2%) usr   0.05 ( 1%) sys   5.26 ( 2%) wall       0 kB ( 0%) ggc
 ipa virtual call target :   3.22 ( 1%) usr   0.03 ( 1%) sys   3.20 ( 1%) wall       0 kB ( 0%) ggc
 ipa devirtualization    :   0.28 ( 0%) usr   0.01 ( 0%) sys   0.26 ( 0%) wall   32638 kB ( 0%) ggc
 ipa cp                  :   4.27 ( 2%) usr   0.24 ( 6%) sys   4.55 ( 2%) wall  851324 kB ( 4%) ggc
 ipa inlining heuristics : 127.09 (47%) usr   0.27 ( 7%) sys 127.25 (46%) wall  807884 kB ( 4%) ggc
 ipa comdats             :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.57 ( 0%) wall       0 kB ( 0%) ggc
 ipa lto gimple in       :   5.47 ( 2%) usr   0.92 (25%) sys   6.37 ( 2%) wall 1370242 kB ( 6%) ggc
 ipa lto decl in         :  79.23 (29%) usr   1.32 (35%) sys  80.53 (29%) wall16957392 kB (79%) ggc
 ipa lto constructors in :   0.33 ( 0%) usr   0.03 ( 1%) sys   0.44 ( 0%) wall   22897 kB ( 0%) ggc
 ipa lto cgraph I/O      :   1.41 ( 1%) usr   0.21 ( 6%) sys   1.62 ( 1%) wall  901987 kB ( 4%) ggc
 ipa lto decl merge      :   3.22 ( 1%) usr   0.00 ( 0%) sys   3.22 ( 1%) wall   16383 kB ( 0%) ggc
 ipa lto cgraph merge    :   5.10 ( 2%) usr   0.01 ( 0%) sys   5.11 ( 2%) wall   20432 kB ( 0%) ggc
 whopr wpa               :   1.95 ( 1%) usr   0.00 ( 0%) sys   1.94 ( 1%) wall       2 kB ( 0%) ggc
 whopr partitioning      :   5.22 ( 2%) usr   0.01 ( 0%) sys   5.23 ( 2%) wall    7800 kB ( 0%) ggc
 ipa reference           :   2.97 ( 1%) usr   0.06 ( 2%) sys   3.02 ( 1%) wall       0 kB ( 0%) ggc
 ipa profile             :   0.52 ( 0%) usr   0.04 ( 1%) sys   0.56 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const          :   3.51 ( 1%) usr   0.04 ( 1%) sys   3.56 ( 1%) wall       0 kB ( 0%) ggc
 ipa icf                 :  19.33 ( 7%) usr   0.12 ( 3%) sys  19.52 ( 7%) wall    3089 kB ( 0%) ggc
 tree SSA rewrite        :   0.35 ( 0%) usr   0.02 ( 1%) sys   0.37 ( 0%) wall   51191 kB ( 0%) ggc
 tree SSA other          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA incremental    :   0.48 ( 0%) usr   0.06 ( 2%) sys   0.37 ( 0%) wall   33552 kB ( 0%) ggc
 tree operand scan       :   0.41 ( 0%) usr   0.08 ( 2%) sys   0.53 ( 0%) wall  343835 kB ( 2%) ggc
 dominance frontiers     :   0.04 ( 0%) usr   0.01 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   0.36 ( 0%) usr   0.09 ( 2%) sys   0.55 ( 0%) wall       0 kB ( 0%) ggc
 varconst                :   0.03 ( 0%) usr   0.03 ( 1%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 loop fini               :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 unaccounted todo        :   1.18 ( 0%) usr   0.00 ( 0%) sys   1.19 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 272.63             3.72           276.25           21422657 kB

AFTER:

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1977 kB ( 0%) ggc
 phase opt and generate  :  73.30 (43%) usr   1.79 (44%) sys  75.06 (43%) wall 2682287 kB (13%) ggc
 phase stream in         :  95.72 (57%) usr   2.25 (56%) sys  97.94 (57%) wall18738391 kB (87%) ggc
 callgraph optimization  :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.76 ( 0%) wall      16 kB ( 0%) ggc
 ipa dead code removal   :   5.19 ( 3%) usr   0.03 ( 1%) sys   5.25 ( 3%) wall       0 kB ( 0%) ggc
 ipa virtual call target :   2.81 ( 2%) usr   0.03 ( 1%) sys   3.15 ( 2%) wall       0 kB ( 0%) ggc
 ipa devirtualization    :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall   32638 kB ( 0%) ggc
 ipa cp                  :   4.59 ( 3%) usr   0.24 ( 6%) sys   4.76 ( 3%) wall  851324 kB ( 4%) ggc
 ipa inlining heuristics :  22.09 (13%) usr   0.26 ( 6%) sys  22.20 (13%) wall  807884 kB ( 4%) ggc
 ipa comdats             :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.57 ( 0%) wall       0 kB ( 0%) ggc
 ipa lto gimple in       :   5.67 ( 3%) usr   0.93 (23%) sys   6.51 ( 4%) wall 1370242 kB ( 6%) ggc
 ipa lto decl in         :  81.86 (48%) usr   1.45 (36%) sys  83.29 (48%) wall16957392 kB (79%) ggc
 ipa lto constructors in :   0.41 ( 0%) usr   0.09 ( 2%) sys   0.36 ( 0%) wall   22897 kB ( 0%) ggc
 ipa lto cgraph I/O      :   1.49 ( 1%) usr   0.25 ( 6%) sys   1.73 ( 1%) wall  901987 kB ( 4%) ggc
 ipa lto decl merge      :   3.55 ( 2%) usr   0.00 ( 0%) sys   3.55 ( 2%) wall   16383 kB ( 0%) ggc
 ipa lto cgraph merge    :   5.05 ( 3%) usr   0.00 ( 0%) sys   5.07 ( 3%) wall   20432 kB ( 0%) ggc
 whopr wpa               :   1.88 ( 1%) usr   0.00 ( 0%) sys   1.86 ( 1%) wall       2 kB ( 0%) ggc
 whopr partitioning      :   4.89 ( 3%) usr   0.02 ( 0%) sys   4.90 ( 3%) wall    7800 kB ( 0%) ggc
 ipa reference           :   2.85 ( 2%) usr   0.05 ( 1%) sys   2.91 ( 2%) wall       0 kB ( 0%) ggc
 ipa profile             :   0.55 ( 0%) usr   0.04 ( 1%) sys   0.59 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const          :   3.28 ( 2%) usr   0.04 ( 1%) sys   3.33 ( 2%) wall       0 kB ( 0%) ggc
 ipa icf                 :  18.23 (11%) usr   0.12 ( 3%) sys  18.29 (11%) wall    3089 kB ( 0%) ggc
 tree SSA rewrite        :   0.26 ( 0%) usr   0.04 ( 1%) sys   0.32 ( 0%) wall   51191 kB ( 0%) ggc
 tree SSA other          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA incremental    :   0.51 ( 0%) usr   0.16 ( 4%) sys   0.60 ( 0%) wall   33552 kB ( 0%) ggc
 tree operand scan       :   0.36 ( 0%) usr   0.13 ( 3%) sys   0.49 ( 0%) wall  343835 kB ( 2%) ggc
 dominance frontiers     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   0.39 ( 0%) usr   0.06 ( 1%) sys   0.63 ( 0%) wall       0 kB ( 0%) ggc
 varconst                :   0.05 ( 0%) usr   0.04 ( 1%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 loop fini               :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 unaccounted todo        :   1.26 ( 1%) usr   0.00 ( 0%) sys   1.26 ( 1%) wall       0 kB ( 0%) ggc
 TOTAL                 : 169.02             4.04           173.00           21422657 kB

perf report after:

    10.17%  lto1-wpa  lto1               [.] inflate_fast
     3.74%  lto1-wpa  lto1               [.] compare_tree_sccs_1(tree_node*, tree_node*, tree_node***)
     3.56%  lto1-wpa  lto1               [.] streamer_read_uhwi(lto_input_block*)
     3.16%  lto1-wpa  lto1               [.] ht_lookup_with_hash(ht*, unsigned char const*, unsigned long, unsigned int, ht_lookup_option)
     3.01%  lto1-wpa  lto1               [.] unify_scc(streamer_tree_cache_d*, unsigned int, unsigned int, unsigned int, unsigned int)
     2.69%  lto1-wpa  lto1               [.] streamer_read_tree_bitfields(lto_input_block*, data_in*, tree_node*)
     2.16%  lto1-wpa  lto1               [.] lto_cgraph_replace_node(cgraph_node*, cgraph_node*)
     2.00%  lto1-wpa  lto1               [.] streamer_get_pickled_tree(lto_input_block*, data_in*)
     2.00%  lto1-wpa  libc-2.19.so       [.] msort_with_tmp.part.0
     1.91%  lto1-wpa  lto1               [.] ipa_icf::sem_variable::equals(tree_node*, tree_node*)
     1.72%  lto1-wpa  libc-2.19.so       [.] _int_malloc
     1.70%  lto1-wpa  lto1               [.] symbol_table::remove_unreachable_nodes(_IO_FILE*)
     1.54%  lto1-wpa  lto1               [.] lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int)
     1.33%  lto1-wpa  lto1               [.] inflate
     1.21%  lto1-wpa  lto1               [.] adler32
     1.16%  lto1-wpa  lto1               [.] cgraph_node::call_for_symbol_thunks_and_aliases(bool (*)(cgraph_node*, void*), void*, bool, bool)
     1.11%  lto1-wpa  lto1               [.] lto_input_tree(lto_input_block*, data_in*)
     1.07%  lto1-wpa  lto1               [.] streamer_read_tree_body(lto_input_block*, data_in*, tree_node*)
     1.03%  lto1-wpa  lto1               [.] lto_input_location(bitpack_d*, data_in*)
     1.01%  lto1-wpa  lto1               [.] htab_hash_string
     0.99%  lto1-wpa  lto1               [.] estimate_calls_size_and_time(cgraph_node*, int*, int*, int*, int*, unsigned int, vec<tree_node*, va_heap, vl_ptr>, vec<ipa_polymorphic_call_context, va_heap, vl_ptr>, vec<ipa_agg_jump_function*, va_heap, vl_ptr>) [clone .isra.137]
     0.92%  lto1-wpa  lto1               [.] ht_lookup(ht*, unsigned char const*, unsigned long, ht_lookup_option)
     0.92%  lto1-wpa  lto1               [.] ggc_internal_alloc(unsigned long, void (*)(void*), unsigned long, unsigned long)
     0.86%  lto1-wpa  lto1               [.] splay_tree_splay
     0.83%  lto1-wpa  lto1               [.] bp_unpack_var_len_unsigned(bitpack_d*)
     0.80%  lto1-wpa  libc-2.19.so       [.] malloc_consolidate
     0.77%  lto1-wpa  lto1               [.] can_inline_edge_p(cgraph_edge*, bool, bool)
     0.72%  lto1-wpa  lto1               [.] gimple_has_body_p(tree_node*)



Thanks,
Martin

Comments

Jan Hubicka Feb. 17, 2015, 6:38 p.m. UTC | #1
Hi,
thanks for working on it.  There are 3 basically indpeendent changes in the patch
 - The patch to make checking in lto_streamer_init ENABLE_CHECKING only that I
   think can be comitted as obvoius.
 - Templates for call_for_symbol_and_aliases
   I do not think these should be strictly necessary for perofrmance, because once we
   spent too much time in these we are bit screwed.
   I however see it also makes things bit nicer by not needing typecasts on data pointer.
   Pehraps that could be further cleaned?

   Alternative would be to implement FOR_EACH_ALIAS macro with tree walking iterator.
   You have all the structure to not require stack.  Iterator will ocntain an
   root node, current node and index to ref.
   This may be even easier to use and probably wind up generating about the same code
   given that the for each template anyway needs to produce self recursive function.

   I would not care about for_symbol_thunk_and_aliases.  That function is heavy by walking
   all callers anyway and should not be used in hot code.
   I have patch that removes its use from inliner - it is more or less leftover from time
   we represented thunks as special aliases instead of functions w/o gimple body.
 - the caching itself.

I will look into the caching in detail.  I am not quite sure I like the idea of exposing inline
only cache into cgraph.h.  You could just keep the predicates as are, but have inline_ variants
in ipa-inline.h that does the caching for you.

Allocating the bits directly in cgraph_node is probably OK, we don't really have shortage there
and can be revisited easily later...

Honza
diff mbox

Patch

From 4e878a928ff7e9fe4eee0ea4b241c01c4440bd60 Mon Sep 17 00:00:00 2001
From: mliska <mliska@suse.cz>
Date: Mon, 16 Feb 2015 16:48:01 +0100
Subject: [PATCH] ipa-inline: introduce computed value that speeds up IPA
 inliner.

---
 gcc/cgraph.c       |  77 -------------
 gcc/cgraph.h       | 309 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 gcc/ipa-inline.c   |   2 +
 gcc/lto-streamer.c |   2 +
 gcc/symtab.c       |  48 ++++++---
 5 files changed, 345 insertions(+), 93 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 3548bd0..b72a6c0 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2403,83 +2403,6 @@  cgraph_edge::maybe_hot_p (void)
   return true;
 }
 
-/* Worker for cgraph_can_remove_if_no_direct_calls_p.  */
-
-static bool
-nonremovable_p (cgraph_node *node, void *)
-{
-  return !node->can_remove_if_no_direct_calls_and_refs_p ();
-}
-
-/* Return true when function cgraph_node and its aliases can be removed from
-   callgraph if all direct calls are eliminated.  */
-
-bool
-cgraph_node::can_remove_if_no_direct_calls_p (void)
-{
-  /* Extern inlines can always go, we will use the external definition.  */
-  if (DECL_EXTERNAL (decl))
-    return true;
-  if (address_taken)
-    return false;
-  return !call_for_symbol_and_aliases (nonremovable_p, NULL, true);
-}
-
-/* Return true when function cgraph_node can be expected to be removed
-   from program when direct calls in this compilation unit are removed.
-
-   As a special case COMDAT functions are
-   cgraph_can_remove_if_no_direct_calls_p while the are not
-   cgraph_only_called_directly_p (it is possible they are called from other
-   unit)
-
-   This function behaves as cgraph_only_called_directly_p because eliminating
-   all uses of COMDAT function does not make it necessarily disappear from
-   the program unless we are compiling whole program or we do LTO.  In this
-   case we know we win since dynamic linking will not really discard the
-   linkonce section.  */
-
-bool
-cgraph_node::will_be_removed_from_program_if_no_direct_calls_p (void)
-{
-  gcc_assert (!global.inlined_to);
-
-  if (call_for_symbol_and_aliases (used_from_object_file_p_worker,
-				   NULL, true))
-    return false;
-  if (!in_lto_p && !flag_whole_program)
-    return only_called_directly_p ();
-  else
-    {
-       if (DECL_EXTERNAL (decl))
-         return true;
-      return can_remove_if_no_direct_calls_p ();
-    }
-}
-
-
-/* Worker for cgraph_only_called_directly_p.  */
-
-static bool
-cgraph_not_only_called_directly_p_1 (cgraph_node *node, void *)
-{
-  return !node->only_called_directly_or_aliased_p ();
-}
-
-/* Return true when function cgraph_node and all its aliases are only called
-   directly.
-   i.e. it is not externally visible, address was not taken and
-   it is not used in any other non-standard way.  */
-
-bool
-cgraph_node::only_called_directly_p (void)
-{
-  gcc_assert (ultimate_alias_target () == this);
-  return !call_for_symbol_and_aliases (cgraph_not_only_called_directly_p_1,
-				       NULL, true);
-}
-
-
 /* Collect all callers of NODE.  Worker for collect_callers_of_node.  */
 
 static bool
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 06d2704..39cb340 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -261,17 +261,29 @@  public:
 				  void *data,
 				  bool include_overwrite);
 
+  /* Call callback on symtab node and aliases associated to this node.
+     When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+     skipped.  */
+  template <typename Arg, bool (*callback) (symtab_node*, Arg arg)>
+  bool call_for_symbol_and_aliases (Arg data, bool include_overwrite);
+
   /* If node can not be interposable by static or dynamic linker to point to
      different definition, return this symbol. Otherwise look for alias with
      such property and if none exists, introduce new one.  */
   symtab_node *noninterposable_alias (void);
 
+  /* Worker searching noninterposable alias.  */
+  static bool noninterposable_alias (symtab_node *node, symtab_node **data);
+
   /* Return node that alias is aliasing.  */
   inline symtab_node *get_alias_target (void);
 
   /* Set section for symbol and its aliases.  */
   void set_section (const char *section);
 
+  /* Worker for set_section.  */
+  static bool set_section (symtab_node *n, const char *s);
+
   /* Set section, do not recurse into aliases.
      When one wants to change section of symbol and its aliases,
      use set_section.  */
@@ -523,6 +535,11 @@  protected:
   bool call_for_symbol_and_aliases_1 (bool (*callback) (symtab_node *, void *),
 				      void *data,
 				      bool include_overwrite);
+
+  /* Worker for call_for_symbol_and_aliases.  */
+  template <typename Arg, bool (*callback) (symtab_node *, Arg)>
+  bool call_for_symbol_and_aliases_1 (Arg data, bool include_overwritable);
+
 private:
   /* Worker for set_section.  */
   static bool set_section (symtab_node *n, void *s);
@@ -1042,6 +1059,13 @@  public:
 						      void *),
 				    void *data, bool include_overwritable);
 
+  /* Call callback on function and aliases associated to the function.
+     When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+     skipped. */
+  template <typename Arg, bool (*callback) (cgraph_node *, Arg)>
+  bool call_for_symbol_and_aliases (Arg data, bool include_overwritable);
+
+
   /* Call callback on cgraph_node, thunks and aliases associated to NODE.
      When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
      skipped.  When EXCLUDE_VIRTUAL_THUNKS is true, virtual thunks are
@@ -1052,6 +1076,15 @@  public:
 					   bool include_overwritable,
 					   bool exclude_virtual_thunks = false);
 
+  /* Call callback on cgraph_node, thunks and aliases associated to NODE.
+     When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+     skipped.  When EXCLUDE_VIRTUAL_THUNKS is true, virtual thunks are
+     skipped.  */
+  template <typename Arg, bool (*callback) (cgraph_node *, Arg)>
+  bool call_for_symbol_thunks_and_aliases (Arg data,
+					   bool include_overwritable,
+					   bool exclude_virtual_thunks = false);
+
   /* Likewise indicate that a node is needed, i.e. reachable via some
      external means.  */
   inline void mark_force_output (void);
@@ -1093,6 +1126,9 @@  public:
      the program unless we are compiling whole program or we do LTO.  In this
      case we know we win since dynamic linking will not really discard the
      linkonce section.  */
+  bool will_be_removed_from_program_if_no_direct_calls_compute_p (void);
+
+  /* Wrapper for will_be_removed_from_program_if_no_direct_calls_compute_p.  */
   bool will_be_removed_from_program_if_no_direct_calls_p (void);
 
   /* Return true when function can be removed from callgraph
@@ -1101,8 +1137,15 @@  public:
 
   /* Return true when function cgraph_node and its aliases can be removed from
      callgraph if all direct calls are eliminated.  */
+  bool can_remove_if_no_direct_calls_compute_p (void);
+
+  /* Wrapper for can_remove_if_no_direct_calls_compute_p.  */
   bool can_remove_if_no_direct_calls_p (void);
 
+  /* Worker for cgraph_can_remove_if_no_direct_calls_p.  */
+  static bool nonremovable_p (cgraph_node *node, void *);
+  static bool nonremovable_compute_p (cgraph_node *node, void *);
+
   /* Return true when callgraph node is a function with Gimple body defined
      in current unit.  Functions can also be define externally or they
      can be thunks with no Gimple representation.
@@ -1295,11 +1338,24 @@  public:
   /* True if there was multiple COMDAT bodies merged by lto-symtab.  */
   unsigned merged : 1;
 
+  /* IPA inline cached values.  */
+  unsigned inline_nonremovable_init: 1;
+  unsigned inline_can_remove_if_no_direct_calls_init: 1;
+  unsigned inline_will_be_removed_if_no_direct_calls_init: 1;
+
+  unsigned inline_nonremovable: 1;
+  unsigned inline_can_remove_if_no_direct_calls: 1;
+  unsigned inline_will_be_removed_if_no_direct_calls: 1;
+
 private:
   /* Worker for call_for_symbol_and_aliases.  */
   bool call_for_symbol_and_aliases_1 (bool (*callback) (cgraph_node *,
 						        void *),
 				      void *data, bool include_overwritable);
+
+  /* Worker for call_for_symbol_and_aliases.  */
+  template <typename Arg, bool (*callback) (cgraph_node *, Arg)>
+  bool call_for_symbol_and_aliases_1 (Arg data, bool include_overwritable);
 };
 
 /* A cgraph node set is a collection of cgraph nodes.  A cgraph node
@@ -1683,6 +1739,12 @@  public:
 				    void *data,
 				    bool include_overwritable);
 
+  /* Call calback on varpool symbol and aliases associated to varpool symbol.
+     When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+     skipped. */
+  template <typename Arg, bool (*callback) (varpool_node *, Arg)>
+  bool call_for_symbol_and_aliases (Arg data, bool include_overwritable);
+
   /* Return true when variable should be considered externally visible.  */
   bool externally_visible_p (void);
 
@@ -1761,6 +1823,10 @@  private:
   bool call_for_symbol_and_aliases_1 (bool (*callback) (varpool_node *, void *),
 				      void *data,
 				      bool include_overwritable);
+
+  /* Worker for call_for_symbol_and_aliases.  */
+  template <typename Arg, bool (*callback) (varpool_node*, Arg arg)>
+  bool call_for_symbol_and_aliases_1 (Arg data, bool include_overwritable);
 };
 
 /* Every top level asm statement is put into a asm_node.  */
@@ -1862,7 +1928,7 @@  public:
   friend class cgraph_node;
   friend class cgraph_edge;
 
-  symbol_table (): cgraph_max_summary_uid (1)
+  symbol_table (): cgraph_max_summary_uid (1), enable_inline_cache (false)
   {
   }
 
@@ -2101,6 +2167,9 @@  public:
 
   FILE* GTY ((skip)) dump_file;
 
+  /* Inline cache flag.  */
+  bool enable_inline_cache;
+
 private:
   /* Allocate new callgraph node.  */
   inline cgraph_node * allocate_cgraph_symbol (void);
@@ -2987,6 +3056,21 @@  symtab_node::call_for_symbol_and_aliases (bool (*callback) (symtab_node *,
   return false;
 }
 
+template <typename Arg, bool (*callback) (symtab_node *, Arg arg)>
+inline bool
+symtab_node::call_for_symbol_and_aliases (Arg data, bool include_overwritable)
+{
+  ipa_ref *ref;
+
+  if (callback (this, data))
+    return true;
+  if (iterate_direct_aliases (0, ref))
+    return call_for_symbol_and_aliases_1 <Arg, callback>
+      (data, include_overwritable);
+  return false;
+}
+
+
 /* Call callback on function and aliases associated to the function.
    When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
    skipped.  */
@@ -3004,6 +3088,43 @@  cgraph_node::call_for_symbol_and_aliases (bool (*callback) (cgraph_node *,
   return false;
 }
 
+/* Call callback on function and aliases associated to the function.
+   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+   skipped.  */
+
+template <typename Arg, bool (*callback) (cgraph_node *, Arg arg)>
+inline bool
+cgraph_node::call_for_symbol_and_aliases (Arg data, bool include_overwritable)
+{
+  ipa_ref *ref;
+
+  if (callback (this, data))
+    return true;
+
+  if (iterate_direct_aliases (0, ref))
+    return call_for_symbol_and_aliases_1 <Arg, callback> (data, include_overwritable);
+
+  return false;
+}
+
+template <typename Arg, bool (*callback) (cgraph_node *, Arg arg)>
+inline bool
+cgraph_node::call_for_symbol_and_aliases_1 (Arg data, bool include_overwritable)
+{
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (this, ref)
+    {
+      cgraph_node *alias = dyn_cast <cgraph_node *> (ref->referring);
+      if (include_overwritable
+	  || alias->get_availability () > AVAIL_INTERPOSABLE)
+	if (alias->call_for_symbol_and_aliases <Arg, callback> (data, include_overwritable))
+	  return true;
+    }
+
+  return false;
+}
+
+
 /* Call calback on varpool symbol and aliases associated to varpool symbol.
    When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
    skipped. */
@@ -3021,6 +3142,47 @@  varpool_node::call_for_symbol_and_aliases (bool (*callback) (varpool_node *,
   return false;
 }
 
+
+/* Call calback on varpool symbol and aliases associated to varpool symbol.
+   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+   skipped. */
+
+template <typename Arg, bool (*callback) (varpool_node*, Arg arg)>
+inline bool
+varpool_node::call_for_symbol_and_aliases (Arg data, bool include_overwritable)
+{
+  ipa_ref *ref;
+
+  if (callback (this, data))
+    return true;
+  if (iterate_direct_aliases (0, ref))
+    return call_for_symbol_and_aliases_1 <Arg, callback>
+      (data, include_overwritable);
+
+  return false;
+}
+
+/* Worker for call_for_symbol_and_aliases.  */
+
+template <typename Arg, bool (*callback) (varpool_node*, Arg arg)>
+bool
+varpool_node::call_for_symbol_and_aliases_1 (Arg data,
+					     bool include_overwritable)
+{
+  ipa_ref *ref;
+
+  FOR_EACH_ALIAS (this, ref)
+    {
+      varpool_node *alias = dyn_cast <varpool_node *> (ref->referring);
+      if (include_overwritable
+	  || alias->get_availability () > AVAIL_INTERPOSABLE)
+	if (alias->call_for_symbol_and_aliases <Arg, callback>
+	  (data, include_overwritable))
+	    return true;
+    }
+  return false;
+}
+
 /* Build polymorphic call context for indirect call E.  */
 
 inline
@@ -3094,6 +3256,151 @@  cgraph_local_p (cgraph_node *node)
   return node->local.local && node->instrumented_version->local.local;
 }
 
+inline bool
+cgraph_node::nonremovable_compute_p (cgraph_node *node, void *)
+{
+  return !node->can_remove_if_no_direct_calls_and_refs_p ();
+}
+
+inline bool
+cgraph_node::nonremovable_p (cgraph_node *node, void *)
+{
+  bool retval;
+
+  if (symtab->enable_inline_cache)
+    {
+      if (!node->inline_nonremovable_init)
+        {
+	  node->inline_nonremovable = nonremovable_compute_p (node, NULL);
+	  node->inline_nonremovable_init = true;
+	}
+
+      retval = node->inline_nonremovable;
+
+      gcc_checking_assert (retval == nonremovable_compute_p (node, NULL));
+    }
+  else
+    retval = nonremovable_compute_p (node, NULL);
+
+  return retval;
+}
+
+inline bool
+cgraph_node::can_remove_if_no_direct_calls_compute_p (void)
+{
+  if (DECL_EXTERNAL (decl))
+    return true;
+  if (address_taken)
+    return false;
+
+  return !call_for_symbol_and_aliases <void *, cgraph_node::nonremovable_compute_p>
+    (NULL, true);
+}
+
+/* Return true when function cgraph_node and its aliases can be removed from
+   callgraph if all direct calls are eliminated.  */
+
+inline bool
+cgraph_node::can_remove_if_no_direct_calls_p (void)
+{
+  bool retval;
+
+  if (symtab->enable_inline_cache)
+  {
+    if (!inline_can_remove_if_no_direct_calls_init)
+      {
+	inline_can_remove_if_no_direct_calls = can_remove_if_no_direct_calls_compute_p ();
+	inline_can_remove_if_no_direct_calls_init = true;
+      }
+
+    retval = inline_can_remove_if_no_direct_calls;
+
+    gcc_checking_assert
+      (retval == can_remove_if_no_direct_calls_compute_p ());
+  }
+  else
+    retval = can_remove_if_no_direct_calls_compute_p ();
+
+  return retval;
+}
+
+/* Return true when function cgraph_node can be expected to be removed
+   from program when direct calls in this compilation unit are removed.
+
+   As a special case COMDAT functions are
+   cgraph_can_remove_if_no_direct_calls_p while the are not
+   cgraph_only_called_directly_p (it is possible they are called from other
+   unit)
+
+   This function behaves as cgraph_only_called_directly_p because eliminating
+   all uses of COMDAT function does not make it necessarily disappear from
+   the program unless we are compiling whole program or we do LTO.  In this
+   case we know we win since dynamic linking will not really discard the
+   linkonce section.  */
+
+inline bool
+cgraph_node::will_be_removed_from_program_if_no_direct_calls_compute_p (void)
+{
+  gcc_assert (!global.inlined_to);
+
+  if (call_for_symbol_and_aliases <void *, used_from_object_file_p_worker>
+    (NULL, true))
+      return false;
+  if (!in_lto_p && !flag_whole_program)
+    return only_called_directly_p ();
+  else
+    {
+       if (DECL_EXTERNAL (decl))
+         return true;
+      return can_remove_if_no_direct_calls_p ();
+    }
+}
+
+/* Wrapper for will_be_removed_from_program_if_no_direct_calls_computed_p.  */
+
+inline bool
+cgraph_node::will_be_removed_from_program_if_no_direct_calls_p (void)
+{
+  if (symtab->enable_inline_cache)
+    {
+      if (!inline_will_be_removed_if_no_direct_calls_init)
+        {
+	  inline_will_be_removed_if_no_direct_calls
+	    = will_be_removed_from_program_if_no_direct_calls_compute_p ();
+
+	  inline_will_be_removed_if_no_direct_calls_init = true;
+        }
+
+      gcc_checking_assert (inline_will_be_removed_if_no_direct_calls ==
+	will_be_removed_from_program_if_no_direct_calls_compute_p ());
+      return inline_will_be_removed_if_no_direct_calls;
+    }
+
+  return will_be_removed_from_program_if_no_direct_calls_compute_p ();
+}
+
+/* Worker for cgraph_only_called_directly_p.  */
+
+static bool
+cgraph_not_only_called_directly_p_1 (cgraph_node *node, void *)
+{
+  return !node->only_called_directly_or_aliased_p ();
+}
+
+/* Return true when function cgraph_node and all its aliases are only called
+   directly.
+   i.e. it is not externally visible, address was not taken and
+   it is not used in any other non-standard way.  */
+
+inline bool
+cgraph_node::only_called_directly_p (void)
+{
+  gcc_assert (ultimate_alias_target () == this);
+  return !call_for_symbol_and_aliases (cgraph_not_only_called_directly_p_1,
+				       NULL, true);
+}
+
+
 /* When using fprintf (or similar), problems can arise with
    transient generated strings.  Many string-generation APIs
    only support one result being alive at once (e.g. by
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 287a6dd..8a07e04 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1651,6 +1651,7 @@  inline_small_functions (void)
   ipa_reduced_postorder (order, true, true, NULL);
   free (order);
 
+  symtab->enable_inline_cache = true;
   FOR_EACH_DEFINED_FUNCTION (node)
     if (!node->global.inlined_to)
       {
@@ -1966,6 +1967,7 @@  inline_small_functions (void)
 	}
     }
 
+  symtab->enable_inline_cache = false;
   free_growth_caches ();
   if (dump_file)
     fprintf (dump_file,
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 836dce9..542a813 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -319,11 +319,13 @@  static hash_table<tree_hash_entry> *tree_htab;
 void
 lto_streamer_init (void)
 {
+#ifdef ENABLE_CHECKING
   /* Check that all the TS_* handled by the reader and writer routines
      match exactly the structures defined in treestruct.def.  When a
      new TS_* astructure is added, the streamer should be updated to
      handle it.  */
   streamer_check_handled_ts_structures ();
+#endif
 
 #ifdef LTO_STREAMER_DEBUG
   tree_htab = new hash_table<tree_hash_entry> (31);
diff --git a/gcc/symtab.c b/gcc/symtab.c
index ee47a73..df0950b 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -1337,9 +1337,9 @@  symtab_node::set_section_for_node (const char *section)
 /* Worker for set_section.  */
 
 bool
-symtab_node::set_section (symtab_node *n, void *s)
+symtab_node::set_section (symtab_node *n, const char *s)
 {
-  n->set_section_for_node ((char *)s);
+  n->set_section_for_node (s);
   return false;
 }
 
@@ -1349,8 +1349,7 @@  void
 symtab_node::set_section (const char *section)
 {
   gcc_assert (!this->alias);
-  call_for_symbol_and_aliases
-    (symtab_node::set_section, const_cast<char *>(section), true);
+  call_for_symbol_and_aliases <const char *, symtab_node::set_section> (section, true);
 }
 
 /* Return the initialization priority.  */
@@ -1491,10 +1490,11 @@  symtab_node::resolve_alias (symtab_node *target)
     {
       error ("section of alias %q+D must match section of its target", decl);
     }
-  call_for_symbol_and_aliases (symtab_node::set_section,
-			     const_cast<char *>(target->get_section ()), true);
+  call_for_symbol_and_aliases <const char *, symtab_node::set_section>
+    (const_cast<char *>(target->get_section ()), true);
   if (target->implicit_section)
-    call_for_symbol_and_aliases (set_implicit_section, NULL, true);
+    call_for_symbol_and_aliases <void *, symtab_node::set_implicit_section>
+      (NULL, true);
 
   /* Alias targets become redundant after alias is resolved into an reference.
      We do not want to keep it around or we would have to mind updating them
@@ -1513,7 +1513,7 @@  symtab_node::resolve_alias (symtab_node *target)
 /* Worker searching noninterposable alias.  */
 
 bool
-symtab_node::noninterposable_alias (symtab_node *node, void *data)
+symtab_node::noninterposable_alias (symtab_node *node, symtab_node **data)
 {
   if (decl_binds_to_current_def_p (node->decl))
     {
@@ -1530,7 +1530,7 @@  symtab_node::noninterposable_alias (symtab_node *node, void *data)
 	  || DECL_ATTRIBUTES (node->decl) != DECL_ATTRIBUTES (fn->decl))
 	return false;
 
-      *(symtab_node **)data = node;
+      *data = node;
       return true;
     }
   return false;
@@ -1550,8 +1550,8 @@  symtab_node::noninterposable_alias (void)
      (if that is already non-overwritable).  */
   symtab_node *node = ultimate_alias_target ();
   gcc_assert (!node->alias && !node->weakref);
-  node->call_for_symbol_and_aliases (symtab_node::noninterposable_alias,
-				   (void *)&new_node, true);
+  node->call_for_symbol_and_aliases
+    <symtab_node **, symtab_node::noninterposable_alias> (&new_node, true);
   if (new_node)
     return new_node;
 #ifndef ASM_OUTPUT_DEF
@@ -1840,10 +1840,8 @@  symtab_node::equal_address_to (symtab_node *s2)
 /* Worker for call_for_symbol_and_aliases.  */
 
 bool
-symtab_node::call_for_symbol_and_aliases_1 (bool (*callback) (symtab_node *,
-							      void *),
-					    void *data,
-					    bool include_overwritable)
+symtab_node::call_for_symbol_and_aliases_1 (bool (*callback) (symtab_node *,void *),
+                                           void *data, bool include_overwritable)
 {
   ipa_ref *ref;
   FOR_EACH_ALIAS (this, ref)
@@ -1857,3 +1855,23 @@  symtab_node::call_for_symbol_and_aliases_1 (bool (*callback) (symtab_node *,
     }
   return false;
 }
+
+/* Worker for call_for_symbol_and_aliases.  */
+
+template <typename Arg, bool (*callback) (symtab_node*, Arg arg)>
+bool
+symtab_node::call_for_symbol_and_aliases_1 (Arg data,
+					    bool include_overwritable)
+{
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (this, ref)
+    {
+      symtab_node *alias = ref->referring;
+      if (include_overwritable
+	  || alias->get_availability () > AVAIL_INTERPOSABLE)
+	if (alias->call_for_symbol_and_aliases <Arg, callback> (data,
+					      include_overwritable))
+	  return true;
+    }
+  return false;
+}
-- 
2.1.2