Patchwork Type inheritance graph analysis & speculative devirtualization, part 4/7 (unreachable virtual method removal)

login
register
mail settings
Submitter Jan Hubicka
Date Sept. 8, 2013, 4:31 p.m.
Message ID <20130908163150.GA22772@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/273451/
State New
Headers show

Comments

Jan Hubicka - Sept. 8, 2013, 4:31 p.m.
Hi,
this patch make symtab_remove_unreachable_nodes to use polymorphic call analysis
same was as the cgraph construction code.  The use is basically simple - instead
of marking all virtual functions as potentially reachable until inlining,
we mark only those that appear in the list of possible targets.   Like in
cgraph construction I deal with two special cases

1) when there is only one target and it is final list, I devirtualize
2) when the target is in anonymous namespace, I do not mark it live.
   If it is really live, its virtual table is live and it will be marked
   while visitng it.

I extended now 1) to also deal with case of 0 possible targets - then we turn
call to direct call to BUILT_IN_UNREACHABLE.  This happens i.e. when you have
type in anonymous namespace and all constructors of the type are optimized out.
I retrofited the change into same place in the cgraph construction code.

I also removed non-speculative devirtualization from ipa-devirt pass, since it
is always done by unreachable code removal now.  It makes more sense there: it
is better to devirtualize early and the test is essentially free during the
walk.

To get analysis of anonymous namespace types right, ipa-devirt code now check
reachablity of virtual table it visits.  This makes the target cache sensitive
to list of reachable virtual tables and thus it needs to register newly
introduced varpool_node_removal hook and flush the cache when this happens.

New timevar is added to keep time spent by unreachable code removal logged.  It
is not increased in any dramatic way for Firefox WPA build - it stays around
1-2% of WPA time.

The effect of this patch on firefox is not grand. About 2000 extra functions
are removed prior inlining, but it is not saving peak memory since they are
already loaded to memory anyway and formely they was removed after inlining.  I
did not measure effect on inliner decision quality that ought to be positively
affected by this. This code will become more effective with followup patch for
better hiearchy analysis (when it will be able to take into account knowledge
that becomes known by early inlining and thus it will become stronger than the
equivalent logic done at cgraph construction time).

Bootstrapped/regtested x86_64-linux with workaround for the current pt.c ICE,
will commit it shortly.

Honza

	* cgraphunit.c (walk_polymorphic_call_targets): Permit 0 possible
	targets and devirtualize to BUILT_IN_UNREACHABLE.
	* timevar.def (TV_IPA_UNREACHABLE): New timevar.
	* ipa.c (walk_polymorphic_call_targets): New function.
	(symtab_remove_unreachable_nodes): Use it; do not keep all virtual
	functions; use the new timevar.
	* ipa-devirt.c (maybe_record_node): Do not insert static nodes that
	was removed from the program.
	(record_binfo): If BINFO corresponds to an anonymous namespace, we may
	not consider it in the walk when its vtable is dead.
	(possible_polymorphic_call_targets_1): Pass anonymous flag to
	record_binfo.
	(devirt_variable_node_removal_hook): New function.
	(possible_polymorphic_call_targets): Also register
	devirt_variable_node_removal_hook.
	(ipa_devirt): Do not do non-speculative devirtualization.
	(gate_ipa_devirt): One execute if devirtualizing speculatively.

	* testsuite/g++.dg/ipa/devirt-11.C: Update template.
	* testsuite/g++.dg/ipa/devirt-16.C: New testcase.
	* testsuite/g++.dg/ipa/devirt-17.C: New testcase.
	* testsuite/g++.dg/ipa/devirt-18.C: New testcase.

Patch

Index: cgraphunit.c
===================================================================
--- cgraphunit.c	(revision 202364)
+++ cgraphunit.c	(working copy)
@@ -866,9 +866,15 @@  walk_polymorphic_call_targets (pointer_s
      make the edge direct.  */
   if (final)
     {
-      gcc_assert (targets.length());
-      if (targets.length() == 1)
+      if (targets.length() <= 1)
 	{
+	  cgraph_node *target;
+	  if (targets.length () == 1)
+	    target = targets[0];
+	  else
+	    target = cgraph_get_create_node
+		       (builtin_decl_implicit (BUILT_IN_UNREACHABLE));
+
 	  if (cgraph_dump_file)
 	    {
 	      fprintf (cgraph_dump_file,
@@ -877,7 +883,7 @@  walk_polymorphic_call_targets (pointer_s
 				 edge->call_stmt, 0,
 				 TDF_SLIM);
 	    }
-	  cgraph_make_edge_direct (edge, targets[0]);
+	  cgraph_make_edge_direct (edge, target);
 	  cgraph_redirect_edge_call_stmt_to_callee (edge);
 	  if (cgraph_dump_file)
 	    {
@@ -1092,7 +1098,7 @@  analyze_functions (void)
      mangling and same body alias creation before we free DECL_ARGUMENTS
      used by it.  */
   if (!seen_error ())
-  symtab_initialize_asm_name_hash ();
+    symtab_initialize_asm_name_hash ();
 }
 
 /* Translate the ugly representation of aliases as alias pairs into nice
Index: testsuite/g++.dg/ipa/devirt-16.C
===================================================================
--- testsuite/g++.dg/ipa/devirt-16.C	(revision 0)
+++ testsuite/g++.dg/ipa/devirt-16.C	(revision 0)
@@ -0,0 +1,39 @@ 
+/* We shall devirtualize to unreachable.  No anonymous type method should surivve
+   reachability.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-whole-program"  } */
+namespace {
+class B {
+public:
+  virtual int foo(void)
+{
+  return 0;
+}
+};
+class A : public B {
+public:
+  virtual int foo(void)
+{
+  return 1;
+}
+};
+}
+class B *b;
+main()
+{
+  int c;
+  if (c)
+    {
+    class A a;
+    a.foo();
+    class B b;
+    b.foo();
+    }
+  return b->foo();
+}
+
+/* { dg-final { scan-ipa-dump "Devirtualizing" "whole-program"} } */
+/* { dg-final { scan-ipa-dump "builtin_unreachable" "whole-program"} } */
+/* { dg-final { scan-ipa-dump-not "A::foo" "whole-program"} } */
+/* { dg-final { scan-ipa-dump-not "A::foo" "whole-program"} } */
+/* { dg-final { cleanup-ipa-dump "whole-program" } } */
Index: testsuite/g++.dg/ipa/devirt-18.C
===================================================================
--- testsuite/g++.dg/ipa/devirt-18.C	(revision 0)
+++ testsuite/g++.dg/ipa/devirt-18.C	(revision 0)
@@ -0,0 +1,37 @@ 
+/* We shall devirtualize to unreachable.  No anonymous type method should surivve
+   reachability.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ssa"  } */
+namespace {
+class B {
+public:
+  virtual int foo(void)
+{
+  return 0;
+}
+};
+class A : public B {
+public:
+  virtual int foo(void)
+{
+  return 1;
+}
+};
+}
+class B *b;
+main()
+{
+  if (0)
+    {
+    class A a;
+    a.foo();
+    class B b;
+    b.foo();
+    }
+  return b->foo();
+}
+
+/* { dg-final { scan-tree-dump-not "A::foo" "ssa"} } */
+/* { dg-final { scan-tree-dump-not "B::foo" "ssa"} } */
+/* { dg-final { scan-tree-dump "builtin_unreachable" "ssa"} } */
+/* { dg-final { cleanup-tree-dump "ssa" } } */
Index: testsuite/g++.dg/ipa/devirt-11.C
===================================================================
--- testsuite/g++.dg/ipa/devirt-11.C	(revision 202364)
+++ testsuite/g++.dg/ipa/devirt-11.C	(working copy)
@@ -46,5 +46,4 @@  bar ()
    and two to fn3. While doing so the new symbol for fn2 needs to be
    introduced.  */
 /* { dg-final { scan-ipa-dump-times "Discovered a virtual call to a known target" 3 "inline"  } } */
-/* { dg-final { scan-ipa-dump-times "and turned into root of the clone tree" 1 "inline"  } } */
 /* { dg-final { cleanup-ipa-dump "inline" } } */
Index: testsuite/g++.dg/ipa/devirt-17.C
===================================================================
--- testsuite/g++.dg/ipa/devirt-17.C	(revision 0)
+++ testsuite/g++.dg/ipa/devirt-17.C	(revision 0)
@@ -0,0 +1,44 @@ 
+/* We shall devirtualize to B::foo since it is the only live candidate of an
+   anonymous type.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-whole-program"  } */
+namespace {
+class B {
+public:
+  virtual int foo(void)
+{
+  return 0;
+}
+};
+class A : public B {
+public:
+  virtual int foo(void)
+{
+  return 1;
+}
+};
+}
+class B *b;
+void get_me_lost (void *);
+main()
+{
+  int c;
+  if (c)
+    {
+    class A a;
+    a.foo();
+    }
+  else
+    {
+    b = new (class B);
+    b->foo();
+	get_me_lost ((void *)&b);
+    }
+  return b->foo();
+}
+
+/* { dg-final { scan-ipa-dump "Devirtualizing" "whole-program"} } */
+/* { dg-final { scan-ipa-dump-not "builtin_unreachable" "whole-program"} } */
+/* { dg-final { scan-ipa-dump "B::foo" "whole-program"} } */
+/* { dg-final { scan-ipa-dump-not "A::foo" "whole-program"} } */
+/* { dg-final { cleanup-ipa-dump "whole-program" } } */
Index: timevar.def
===================================================================
--- timevar.def	(revision 202364)
+++ timevar.def	(working copy)
@@ -64,6 +64,7 @@  DEFTIMEVAR (TV_PCH_CPP_RESTORE       , "
 
 DEFTIMEVAR (TV_CGRAPH                , "callgraph construction")
 DEFTIMEVAR (TV_CGRAPHOPT             , "callgraph optimization")
+DEFTIMEVAR (TV_IPA_UNREACHABLE       , "ipa dead code removal")
 DEFTIMEVAR (TV_IPA_INHERITANCE       , "ipa inheritance graph")
 DEFTIMEVAR (TV_IPA_VIRTUAL_CALL      , "ipa virtual call target")
 DEFTIMEVAR (TV_IPA_DEVIRT	     , "ipa devirtualization")
Index: ipa-devirt.c
===================================================================
--- ipa-devirt.c	(revision 202364)
+++ ipa-devirt.c	(working copy)
@@ -570,6 +570,8 @@  maybe_record_node (vec <cgraph_node *> &
       && fcode != BUILT_IN_TRAP
       && !pointer_set_insert (inserted, target)
       && (target_node = cgraph_get_node (target)) != NULL
+      && (TREE_PUBLIC (target)
+	  || target_node->symbol.definition)
       && symtab_real_symbol_p ((symtab_node)target_node))
     {
       pointer_set_insert (cached_polymorphic_call_targets,
@@ -591,6 +593,8 @@  maybe_record_node (vec <cgraph_node *> &
 
    MATCHED_VTABLES tracks virtual tables we already did lookup
    for virtual function in.
+
+   ANONYMOUS is true if BINFO is part of anonymous namespace.
   */
 
 static void
@@ -600,7 +604,8 @@  record_binfo (vec <cgraph_node *> &nodes
 	      tree type_binfo,
 	      HOST_WIDE_INT otr_token,
 	      pointer_set_t *inserted,
-	      pointer_set_t *matched_vtables)
+	      pointer_set_t *matched_vtables,
+	      bool anonymous)
 {
   tree type = BINFO_TYPE (binfo);
   int i;
@@ -611,6 +616,19 @@  record_binfo (vec <cgraph_node *> &nodes
   if (types_same_for_odr (type, otr_type)
       && !pointer_set_insert (matched_vtables, BINFO_VTABLE (type_binfo)))
     {
+      /* For types in anonymous namespace first check if the respective vtable
+	 is alive. If not, we know the type can't be called.  */
+      if (!flag_ltrans && anonymous)
+	{
+	  tree vtable = BINFO_VTABLE (type_binfo);
+	  struct varpool_node *vnode;
+
+	  if (TREE_CODE (vtable) == POINTER_PLUS_EXPR)
+	    vtable = TREE_OPERAND (TREE_OPERAND (vtable, 0), 0);
+	  vnode = varpool_get_node (vtable);
+	  if (!vnode || !vnode->symbol.definition)
+	    return;
+	}
       tree target = gimple_get_virt_method_for_binfo (otr_token, type_binfo);
       if (target)
 	maybe_record_node (nodes, target, inserted);
@@ -626,7 +644,7 @@  record_binfo (vec <cgraph_node *> &nodes
 		       is shared with the outer type.  */
 		    BINFO_VTABLE (base_binfo) ? base_binfo : type_binfo,
 		    otr_token, inserted,
-		    matched_vtables);
+		    matched_vtables, anonymous);
 }
      
 /* Lookup virtual methods matching OTR_TYPE (with OFFSET and OTR_TOKEN)
@@ -646,7 +664,7 @@  possible_polymorphic_call_targets_1 (vec
   unsigned int i;
 
   record_binfo (nodes, binfo, otr_type, binfo, otr_token, inserted,
-	        matched_vtables);
+	        matched_vtables, type->anonymous_namespace);
   for (i = 0; i < type->derived_types.length(); i++)
     possible_polymorphic_call_targets_1 (nodes, inserted, 
 					 matched_vtables,
@@ -735,6 +753,18 @@  devirt_node_removal_hook (struct cgraph_
     free_polymorphic_call_targets_hash ();
 }
 
+/* When virtual table is removed, we may need to flush the cache.  */
+
+static void
+devirt_variable_node_removal_hook (struct varpool_node *n,
+				   void *d ATTRIBUTE_UNUSED)
+{
+  if (cached_polymorphic_call_targets
+      && DECL_VIRTUAL_P (n->symbol.decl)
+      && type_in_anonymous_namespace_p (DECL_CONTEXT (n->symbol.decl)))
+    free_polymorphic_call_targets_hash ();
+}
+
 /* Return vector containing possible targets of polymorphic call of type
    OTR_TYPE caling method OTR_TOKEN with OFFSET.  If FINALp is non-NULL,
    store true if the list is complette. 
@@ -782,8 +812,12 @@  possible_polymorphic_call_targets (tree
       cached_polymorphic_call_targets = pointer_set_create ();
       polymorphic_call_target_hash.create (23);
       if (!node_removal_hook_holder)
-	node_removal_hook_holder =
-	  cgraph_add_node_removal_hook (&devirt_node_removal_hook, NULL);
+	{
+	  node_removal_hook_holder =
+	    cgraph_add_node_removal_hook (&devirt_node_removal_hook, NULL);
+	  varpool_add_node_removal_hook (&devirt_variable_node_removal_hook,
+					 NULL);
+	}
     }
 
   /* Lookup cached answer.  */
@@ -928,11 +962,8 @@  likely_target_p (struct cgraph_node *n)
 }
 
 /* The ipa-devirt pass.
-   This performs very trivial devirtualization:
-     1) when polymorphic call is known to have precisely one target,
-        turn it into direct call
-     2) when polymorphic call has only one likely target in the unit,
-        turn it into speculative call.  */
+   When polymorphic call has only one likely target in the unit,
+   turn it into speculative call.  */
 
 static unsigned int
 ipa_devirt (void)
@@ -965,26 +996,9 @@  ipa_devirt (void)
 	    if (dump_file)
 	      dump_possible_polymorphic_call_targets 
 		(dump_file, e);
+
 	    npolymorphic++;
 
-	    if (final)
-	      {
-		gcc_assert (targets.length());
-		if (targets.length() == 1)
-		  {
-		    if (dump_file)
-		      fprintf (dump_file,
-			       "Devirtualizing call in %s/%i to %s/%i\n",
-			       cgraph_node_name (n), n->symbol.order,
-			       cgraph_node_name (targets[0]), targets[0]->symbol.order);
-		    cgraph_make_edge_direct (e, targets[0]);
-		    ndevirtualized++;
-		    update = true;
-		    continue;
-		  }
-	      }
-	    if (!flag_devirtualize_speculatively)
-	      continue;
 	    if (!cgraph_maybe_hot_edge_p (e))
 	      {
 		if (dump_file)
@@ -1114,7 +1128,7 @@  ipa_devirt (void)
 static bool
 gate_ipa_devirt (void)
 {
-  return flag_devirtualize && !in_lto_p && optimize;
+  return flag_devirtualize_speculatively && !in_lto_p && optimize;
 }
 
 namespace {
Index: ipa.c
===================================================================
--- ipa.c	(revision 202364)
+++ ipa.c	(working copy)
@@ -149,6 +149,84 @@  process_references (struct ipa_ref_list
     }
 }
 
+/* EDGE is an polymorphic call.  If BEFORE_INLINING_P is set, mark
+   all its potential targets as reachable to permit later inlining if
+   devirtualization happens.  After inlining still keep their declarations
+   around, so we can devirtualize to a direct call.
+
+   Also try to make trivial devirutalization when no or only one target is
+   possible.  */
+
+static void
+walk_polymorphic_call_targets (pointer_set_t *reachable_call_targets,
+			       struct cgraph_edge *edge,
+			       symtab_node *first,
+			       pointer_set_t *reachable, bool before_inlining_p)
+{
+  unsigned int i;
+  void *cache_token;
+  bool final;
+  vec <cgraph_node *>targets
+    = possible_polymorphic_call_targets
+	(edge, &final, &cache_token);
+
+  if (!pointer_set_insert (reachable_call_targets,
+			   cache_token))
+    {
+      for (i = 0; i < targets.length(); i++)
+	{
+	  struct cgraph_node *n = targets[i];
+
+	  /* Do not bother to mark virtual methods in anonymous namespace;
+	     either we will find use of virtual table defining it, or it is
+	     unused.  */
+	  if (TREE_CODE (TREE_TYPE (n->symbol.decl)) == METHOD_TYPE
+	      && type_in_anonymous_namespace_p
+		    (method_class_type (TREE_TYPE (n->symbol.decl))))
+	    continue;
+
+	  /* Prior inlining, keep alive bodies of possible targets for
+	     devirtualization.  */
+	   if (n->symbol.definition
+	       && before_inlining_p)
+	     pointer_set_insert (reachable, n);
+
+	  /* Even after inlining we want to keep the possible targets in the
+	     boundary, so late passes can still produce direct call even if
+	     the chance for inlining is lost.  */
+	  enqueue_node ((symtab_node) n, first, reachable);
+	}
+    }
+
+  /* Very trivial devirtualization; when the type is
+     final or anonymous (so we know all its derivation)
+     and there is only one possible virtual call target,
+     make the edge direct.  */
+  if (final)
+    {
+      if (targets.length() <= 1)
+	{
+	  cgraph_node *target;
+	  if (targets.length () == 1)
+	    target = targets[0];
+	  else
+	    target = cgraph_get_create_node
+		       (builtin_decl_implicit (BUILT_IN_UNREACHABLE));
+
+	  if (dump_file)
+	    fprintf (dump_file,
+		     "Devirtualizing call in %s/%i to %s/%i\n",
+		     cgraph_node_name (edge->caller),
+		     edge->caller->symbol.order,
+		     cgraph_node_name (target), target->symbol.order);
+	  edge = cgraph_make_edge_direct (edge, target);
+	  if (cgraph_state != CGRAPH_STATE_IPA_SSA)
+	    cgraph_redirect_edge_call_stmt_to_callee (edge);
+	  else
+	    inline_update_overall_summary (edge->caller);
+	}
+    }
+}
 
 /* Perform reachability analysis and reclaim all unreachable nodes.
 
@@ -214,7 +292,9 @@  symtab_remove_unreachable_nodes (bool be
   bool changed = false;
   struct pointer_set_t *reachable = pointer_set_create ();
   struct pointer_set_t *body_needed_for_clonning = pointer_set_create ();
+  struct pointer_set_t *reachable_call_targets = pointer_set_create ();
 
+  timevar_push (TV_IPA_UNREACHABLE);
 #ifdef ENABLE_CHECKING
   verify_symtab ();
 #endif
@@ -238,10 +318,7 @@  symtab_remove_unreachable_nodes (bool be
       if (node->symbol.definition
 	  && !node->global.inlined_to
 	  && !node->symbol.in_other_partition
-	  && (!cgraph_can_remove_if_no_direct_calls_and_refs_p (node)
-	      /* Keep around virtual functions for possible devirtualization.  */
-	      || (before_inlining_p
-		  && DECL_VIRTUAL_P (node->symbol.decl))))
+	  && !cgraph_can_remove_if_no_direct_calls_and_refs_p (node))
 	{
 	  gcc_assert (!node->global.inlined_to);
 	  pointer_set_insert (reachable, node);
@@ -304,6 +381,19 @@  symtab_remove_unreachable_nodes (bool be
 	  if (!in_boundary_p)
 	    {
 	      struct cgraph_edge *e;
+	      /* Keep alive possible targets for devirtualization.  */
+	      if (optimize && flag_devirtualize)
+		{
+		  struct cgraph_edge *next;
+		  for (e = cnode->indirect_calls; e; e = next)
+		    {
+		      next = e->next_callee;
+		      if (e->indirect_info->polymorphic)
+			walk_polymorphic_call_targets (reachable_call_targets,
+						       e, &first, reachable,
+						       before_inlining_p);
+		    }
+		}
 	      for (e = cnode->callees; e; e = e->next_callee)
 		{
 		  if (e->callee->symbol.definition
@@ -449,6 +539,7 @@  symtab_remove_unreachable_nodes (bool be
 
   pointer_set_destroy (reachable);
   pointer_set_destroy (body_needed_for_clonning);
+  pointer_set_destroy (reachable_call_targets);
 
   /* Now update address_taken flags and try to promote functions to be local.  */
   if (file)
@@ -483,6 +574,7 @@  symtab_remove_unreachable_nodes (bool be
     FOR_EACH_DEFINED_FUNCTION (node)
       ipa_propagate_frequency (node);
 
+  timevar_pop (TV_IPA_UNREACHABLE);
   return changed;
 }