diff mbox

Type inheritance graph analysis & speculative devirtualization, part 1/6

Message ID 20130817193517.GA4750@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Aug. 17, 2013, 7:35 p.m. UTC
Hi,
I have got the ipa-devirt implementation into shape it seems actually useful
and working as expected.  The main selling point of it for the moment is
compile time of C++ programs (especially with LTO) and verification facilities.

Currently we keep all virtual functions in the callgraph until after inlining
in hope that the type based devirtualization will find use of them.  This does
not happen in wast majorty of cases (currently about 400 times for Firefox).
Removing them early, at compile time, reduces tons of streaming and thus
linktime speed and memory use.  For firefox the callgraph node count
goes down from over 3 million functions, to hundred of thousdands.

Martin Liska did statistic for me with a recent Firefox tree (rev 201708).
Without the patch the overall build time is 33m real, 180m user.  LTO link time
is 8m real, 26m user.  Peaking over 15GB of memory use (combined GCC+linker),

With the patch it goes down to 26m real, 149m user and the linking
is 6m real, 24m user. Memory use is peaking at 4GB for WPA, parallel
part runs to 9GB, but that can be controlled by an parallelism (plus
I have independent patch to cut it to half).

Memory use graphs can be seen at
http://kam.mff.cuni.cz/~hubicka/build1.pdf
http://kam.mff.cuni.cz/~hubicka/build2.pdf

I think we thus arrived to shape LTO is practically useful for firefox.
I still hope to cut WPA in approx half in 4.9 timeframe and we need to address
issues with command line options merging at least.

Some of the improvements are seen w/o LTO, too.  We save running early opts
on all those functions, but I did not made any detailed stats.

Since the whole area of devirtualization in middle end seems to have very
little documentation and number of latent bugs and moreover I am by no means
expert on C++ type hiearchy a virtual calls, I decided to break up the change
into small patches and try to write an detailed description for each, so
hopefully it will be easier to eyeball the changes and hopefully identify
issues + get things more sane.

The plan is as follows
part 1: tracking of all polymorphic calls in indirect_info
        (until now we track
part 2: basic ipa-devirt type inheritance graph construction code,
        debugging dumps, function to get possible targets of given
        virtual function call and its use at callgraph construction
        time.

        This is the part bringing important improvements on compile time,
        especially with firefox LTO because we get rid of unreachable
	function calls comming from headers quickly without having them
	to bubble through early optimization, LTO streaming and merging.
part 4: verifier that all devirtualization decisions are in the list of
	possible targets + user warning when this is not true for virtual
	table access folding machinery. Ignoring GCC bugs it can happen only in
	programs with undefined effect. Falss warnings are useful to detect
        changes where type inheritance graph is broken.
part 3: support for One Definition Rule matching at LTO time.
        This needs fix to types_same_for_odr I posted to the original
        thread.
	It also adds logic to match multiple different tree types together,
	detect some ODR violations, eliminate external vtables when internal
	are around and other bit subtle details.
part 4: Use of ipa-devirt in unreachable function removal (that is done at
	LTO time, too)
part 5: Do devirtualization for types in local namespaces where there is
	only one candidate.
part 6: Do speculative devirtualization when there is only one candidate.
	This will handle Firefox's addref.

All these changes are conceptually very easy. I found number of weird things
and little improvements in the current devirt code I will send independently.
I am also trying to keep code flexible so it allows further development:
such as tracking of multiple type variants with same targets thorough
ipa-cp and folding and such.  The dynamic type changes seems somewhat
scary so I want to keep code simple at start to be sure it won't produce
wrong code. (in the plan above only part 5 risks bad code).

This is part 1 of the series. Nothing really fancy is going on here.
It adds code to cgraph.c to notice all polymorphic calls and
record the info.  

I also added virtual_method_call_p that disables the devirtualization
code path from trying to do something on the OBJ-C OBJ_TYPE_REF use
because it can not work as currently written.  If we remove it, we can
use test for OBJ_TYPE_REF again instead.

Boostrapped/regtested x86_64-linux, testing ppc64-linux and will
commit it if passes and there are no objections.

Honza

	* cgraph.c (cgraph_create_indirect_edge): Discover
	polymorphic calls and record basic info into indirect_info.
	* gimple-fold.c (gimple_fold_call): When doing BINFO based
	devirtualization, ignore objc function calls.
	* ipa-cp.c (initialize_node_lattices): Be ready for polymorphic
	call with no parm index info.
	* ipa-prop.c (ipa_analyze_call_uses): Likewise.
	* tree.c (virtual_method_call_p): New function.
	* tree.h (virtual_method_call_p): Declare.
diff mbox

Patch

Index: cgraph.c
===================================================================
--- cgraph.c	(revision 201814)
+++ cgraph.c	(working copy)
@@ -925,6 +925,7 @@  cgraph_create_indirect_edge (struct cgra
 {
   struct cgraph_edge *edge = cgraph_create_edge_1 (caller, NULL, call_stmt,
 						   count, freq);
+  tree target;
 
   edge->indirect_unknown_callee = 1;
   initialize_inline_failed (edge);
@@ -932,6 +933,23 @@  cgraph_create_indirect_edge (struct cgra
   edge->indirect_info = cgraph_allocate_init_indirect_info ();
   edge->indirect_info->ecf_flags = ecf_flags;
 
+  /* Record polymorphic call info.  */
+  if (call_stmt
+      && (target = gimple_call_fn (call_stmt))
+      && virtual_method_call_p (target))
+    {
+      tree type = obj_type_ref_class (target);
+
+
+      /* Only record types can have virtual calls.  */
+      gcc_assert (TREE_CODE (type) == RECORD_TYPE);
+      edge->indirect_info->param_index = -1;
+      edge->indirect_info->otr_token
+	 = tree_low_cst (OBJ_TYPE_REF_TOKEN (target), 1);
+      edge->indirect_info->otr_type = type;
+      edge->indirect_info->polymorphic = 1;
+    }
+
   edge->next_callee = caller->indirect_calls;
   if (caller->indirect_calls)
     caller->indirect_calls->prev_callee = edge;
Index: gimple-fold.c
===================================================================
--- gimple-fold.c	(revision 201814)
+++ gimple-fold.c	(working copy)
@@ -1105,7 +1105,7 @@  gimple_fold_call (gimple_stmt_iterator *
 	  gimple_call_set_fn (stmt, OBJ_TYPE_REF_EXPR (callee));
 	  changed = true;
 	}
-      else
+      else if (virtual_method_call_p (callee))
 	{
 	  tree obj = OBJ_TYPE_REF_OBJECT (callee);
 	  tree binfo = gimple_extract_devirt_binfo_from_cst
Index: ipa-cp.c
===================================================================
--- ipa-cp.c	(revision 201814)
+++ ipa-cp.c	(working copy)
@@ -734,7 +734,8 @@  initialize_node_lattices (struct cgraph_
     }
 
   for (ie = node->indirect_calls; ie; ie = ie->next_callee)
-    if (ie->indirect_info->polymorphic)
+    if (ie->indirect_info->polymorphic
+        && ie->indirect_info->param_index >= 0)
       {
 	gcc_checking_assert (ie->indirect_info->param_index >= 0);
 	ipa_get_parm_lattices (info,
Index: ipa-prop.c
===================================================================
--- ipa-prop.c	(revision 201814)
+++ ipa-prop.c	(working copy)
@@ -1922,7 +1922,7 @@  ipa_analyze_call_uses (struct cgraph_nod
     return;
   if (TREE_CODE (target) == SSA_NAME)
     ipa_analyze_indirect_call_uses (node, info, parms_ainfo, call, target);
-  else if (TREE_CODE (target) == OBJ_TYPE_REF)
+  else if (virtual_method_call_p (target))
     ipa_analyze_virtual_call_uses (node, info, call, target);
 }
 
Index: tree.c
===================================================================
--- tree.c	(revision 201817)
+++ tree.c	(working copy)
@@ -11864,6 +11864,27 @@  types_same_for_odr (tree type1, tree typ
   return true;
 }
 
+/* TARGET is a call target of GIMPLE call statement
+   (obtained by gimple_call_fn).  Return true if it is
+   OBJ_TYPE_REF representing an virtual call of C++ method.
+   (As opposed to OBJ_TYPE_REF representing objc calls
+   through a cast where middle-end devirtualization machinery
+   can't apply.) */
+
+bool
+virtual_method_call_p (tree target)
+{
+  if (TREE_CODE (target) != OBJ_TYPE_REF)
+    return false;
+  target = TREE_TYPE (target);
+  gcc_checking_assert (TREE_CODE (target) == POINTER_TYPE);
+  target = TREE_TYPE (target);
+  if (TREE_CODE (target) == FUNCTION_TYPE)
+    return false;
+  gcc_checking_assert (TREE_CODE (target) == METHOD_TYPE);
+  return true;
+}
+
 /* REF is OBJ_TYPE_REF, return the class the ref corresponds to.  */
 
 tree
Index: tree.h
===================================================================
--- tree.h	(revision 201814)
+++ tree.h	(working copy)
@@ -5974,6 +5974,7 @@  extern location_t tree_nonartificial_loc
 extern tree block_ultimate_origin (const_tree);
 
 extern tree get_binfo_at_offset (tree, HOST_WIDE_INT, tree);
+extern bool virtual_method_call_p (tree);
 extern tree obj_type_ref_class (tree ref);
 extern bool types_same_for_odr (tree type1, tree type2);
 extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *,