Patchwork Split cfgexpand and var-tracking timevars

login
register
mail settings
Submitter Jan Hubicka
Date July 3, 2010, 3:25 p.m.
Message ID <20100703152527.GI6378@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/57812/
State New
Headers show

Comments

Jan Hubicka - July 3, 2010, 3:25 p.m.
Hi,
this patch breaks out expansion and var-trakcing timevars to aswer some
questions I got after posting the LTO build numbers.  Now I get:

 garbage collection    :  11.54 ( 2%) usr   0.28 ( 3%) sys  11.82 ( 2%) wall       0 kB ( 0%) ggc
 callgraph optimization:   0.54 ( 0%) usr   0.00 ( 0%) sys   0.54 ( 0%) wall       0 kB ( 0%) ggc
 varpool construction  :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%) wall    7046 kB ( 0%) ggc
 ipa cp                :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) wall   18172 kB ( 1%) ggc
 ipa lto gimple I/O    :   7.24 ( 1%) usr   0.94 (10%) sys   8.32 ( 2%) wall  881022 kB (28%) ggc
 ipa lto decl I/O      :   4.82 ( 1%) usr   0.20 ( 2%) sys   5.05 ( 1%) wall  249171 kB ( 8%) ggc
 ipa lto decl init I/O :   0.40 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall   55386 kB ( 2%) ggc
 ipa lto cgraph I/O    :   0.16 ( 0%) usr   0.02 ( 0%) sys   0.18 ( 0%) wall   50866 kB ( 2%) ggc
 ipa lto decl merge    :   1.70 ( 0%) usr   0.06 ( 1%) sys   1.75 ( 0%) wall      29 kB ( 0%) ggc
 ipa lto cgraph merge  :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall    6831 kB ( 0%) ggc
 ipa reference         :   0.53 ( 0%) usr   0.03 ( 0%) sys   0.56 ( 0%) wall       0 kB ( 0%) ggc
 ipa profile           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const        :   0.68 ( 0%) usr   0.00 ( 0%) sys   0.65 ( 0%) wall    1068 kB ( 0%) ggc
 cfg cleanup           :   8.98 ( 2%) usr   0.04 ( 0%) sys   9.20 ( 2%) wall   30697 kB ( 1%) ggc
 trivially dead code   :   2.83 ( 1%) usr   0.04 ( 0%) sys   3.09 ( 1%) wall       0 kB ( 0%) ggc
 df multiple defs      :   1.78 ( 0%) usr   0.04 ( 0%) sys   1.87 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs      :   4.72 ( 1%) usr   0.05 ( 1%) sys   4.69 ( 1%) wall       0 kB ( 0%) ggc
 df live regs          :  23.07 ( 5%) usr   0.08 ( 1%) sys  23.04 ( 5%) wall       0 kB ( 0%) ggc
 df live&initialized regs:  12.43 ( 2%) usr   0.06 ( 1%) sys  12.48 ( 2%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   2.53 ( 1%) usr   0.00 ( 0%) sys   2.63 ( 1%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   9.57 ( 2%) usr   0.05 ( 1%) sys   9.89 ( 2%) wall   72343 kB ( 2%) ggc
 register information  :   3.00 ( 1%) usr   0.01 ( 0%) sys   3.12 ( 1%) wall       0 kB ( 0%) ggc
 alias analysis        :   9.51 ( 2%) usr   0.03 ( 0%) sys   9.60 ( 2%) wall  200305 kB ( 6%) ggc
 alias stmt walking    :   4.83 ( 1%) usr   0.71 ( 8%) sys   5.69 ( 1%) wall    7104 kB ( 0%) ggc
 register scan         :   1.10 ( 0%) usr   0.00 ( 0%) sys   1.21 ( 0%) wall    1946 kB ( 0%) ggc
 rebuild jump labels   :   2.10 ( 0%) usr   0.00 ( 0%) sys   1.96 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   0.32 ( 0%) usr   0.07 ( 1%) sys   0.69 ( 0%) wall   23250 kB ( 1%) ggc
 inline heuristics     :   2.17 ( 0%) usr   0.08 ( 1%) sys   2.45 ( 0%) wall   74899 kB ( 2%) ggc
 integration           :   7.40 ( 1%) usr   0.56 ( 6%) sys   8.21 ( 2%) wall  699599 kB (22%) ggc
 tree CFG cleanup      :   6.72 ( 1%) usr   0.12 ( 1%) sys   6.48 ( 1%) wall   18567 kB ( 1%) ggc
 tree VRP              :  11.31 ( 2%) usr   0.32 ( 3%) sys  11.43 ( 2%) wall  319191 kB (10%) ggc
 tree copy propagation :   2.48 ( 0%) usr   0.04 ( 0%) sys   2.39 ( 0%) wall   14598 kB ( 0%) ggc
 tree PTA              :   5.39 ( 1%) usr   0.01 ( 0%) sys   5.28 ( 1%) wall   42713 kB ( 1%) ggc
 tree SSA rewrite      :   2.77 ( 1%) usr   0.04 ( 0%) sys   3.09 ( 1%) wall   50979 kB ( 2%) ggc
 tree SSA incremental  :   5.76 ( 1%) usr   0.29 ( 3%) sys   5.55 ( 1%) wall   53118 kB ( 2%) ggc
 tree operand scan     :   3.17 ( 1%) usr   1.42 (15%) sys   3.73 ( 1%) wall  444654 kB (14%) ggc
 dominator optimization:   5.77 ( 1%) usr   0.02 ( 0%) sys   5.87 ( 1%) wall  122530 kB ( 4%) ggc
 tree SRA              :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.26 ( 0%) wall    2961 kB ( 0%) ggc
 tree CCP              :   2.15 ( 0%) usr   0.02 ( 0%) sys   2.41 ( 0%) wall   12129 kB ( 0%) ggc
 tree PHI const/copy prop:   0.19 ( 0%) usr   0.01 ( 0%) sys   0.14 ( 0%) wall    1679 kB ( 0%) ggc
 tree split crit edges :   0.63 ( 0%) usr   0.01 ( 0%) sys   0.46 ( 0%) wall   84149 kB ( 3%) ggc
 tree reassociation    :   0.83 ( 0%) usr   0.01 ( 0%) sys   0.94 ( 0%) wall   15332 kB ( 0%) ggc
 tree PRE              :  28.01 ( 6%) usr   0.22 ( 2%) sys  28.32 ( 6%) wall  223553 kB ( 7%) ggc
 tree FRE              :   5.65 ( 1%) usr   0.22 ( 2%) sys   6.10 ( 1%) wall   27160 kB ( 1%) ggc
 tree code sinking     :   0.66 ( 0%) usr   0.02 ( 0%) sys   0.82 ( 0%) wall    8983 kB ( 0%) ggc
 tree linearize phis   :   0.52 ( 0%) usr   0.04 ( 0%) sys   0.54 ( 0%) wall    2158 kB ( 0%) ggc
 tree forward propagate:   0.72 ( 0%) usr   0.02 ( 0%) sys   0.79 ( 0%) wall   18358 kB ( 1%) ggc
 tree phiprop          :   0.10 ( 0%) usr   0.02 ( 0%) sys   0.08 ( 0%) wall     307 kB ( 0%) ggc
 tree conservative DCE :   1.60 ( 0%) usr   0.24 ( 3%) sys   2.03 ( 0%) wall    1904 kB ( 0%) ggc
 tree aggressive DCE   :   1.42 ( 0%) usr   0.09 ( 1%) sys   1.41 ( 0%) wall   38880 kB ( 1%) ggc
 tree buildin call DCE :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.88 ( 0%) usr   0.01 ( 0%) sys   0.85 ( 0%) wall    4062 kB ( 0%) ggc
 PHI merge             :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) wall    4838 kB ( 0%) ggc
 tree loop bounds      :   0.65 ( 0%) usr   0.00 ( 0%) sys   0.51 ( 0%) wall    8531 kB ( 0%) ggc
 tree loop invariant motion:   1.03 ( 0%) usr   0.00 ( 0%) sys   0.96 ( 0%) wall    1452 kB ( 0%) ggc
 tree canonical iv     :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall    5122 kB ( 0%) ggc
 scev constant prop    :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) wall   16752 kB ( 1%) ggc
 tree loop unswitching :   0.25 ( 0%) usr   0.02 ( 0%) sys   0.30 ( 0%) wall   18321 kB ( 1%) ggc
 complete unrolling    :   0.72 ( 0%) usr   0.03 ( 0%) sys   0.93 ( 0%) wall   42447 kB ( 1%) ggc
 tree vectorization    :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall   18424 kB ( 1%) ggc
 tree slp vectorization:   3.48 ( 1%) usr   0.02 ( 0%) sys   3.78 ( 1%) wall  288800 kB ( 9%) ggc
 tree iv optimization  :   1.86 ( 0%) usr   0.04 ( 0%) sys   2.11 ( 0%) wall   79505 kB ( 3%) ggc
 predictive commoning  :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) wall   11598 kB ( 0%) ggc
 tree loop init        :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.72 ( 0%) wall   19549 kB ( 1%) ggc
 tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 tree copy headers     :   0.28 ( 0%) usr   0.01 ( 0%) sys   0.35 ( 0%) wall   28294 kB ( 1%) ggc
 tree SSA uncprop      :   0.38 ( 0%) usr   0.00 ( 0%) sys   0.29 ( 0%) wall       0 kB ( 0%) ggc
 tree NRV optimization :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     278 kB ( 0%) ggc
 tree rename SSA copies:   0.61 ( 0%) usr   0.01 ( 0%) sys   0.53 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.85 ( 0%) usr   0.02 ( 0%) sys   0.77 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   5.67 ( 1%) usr   0.07 ( 1%) sys   5.92 ( 1%) wall       0 kB ( 0%) ggc
 control dependences   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 out of ssa            :   2.27 ( 0%) usr   0.01 ( 0%) sys   2.37 ( 0%) wall    1539 kB ( 0%) ggc
 expand vars           :   3.55 ( 1%) usr   0.00 ( 0%) sys   3.71 ( 1%) wall   85419 kB ( 3%) ggc
 expand                :  38.67 ( 8%) usr   0.56 ( 6%) sys  39.27 ( 8%) wall  801014 kB (25%) ggc
 post expand cleanups  :   0.78 ( 0%) usr   0.02 ( 0%) sys   0.81 ( 0%) wall   70191 kB ( 2%) ggc
 varconst              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 lower subreg          :   0.08 ( 0%) usr   0.01 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 jump                  :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 forward prop          :   5.00 ( 1%) usr   0.09 ( 1%) sys   4.92 ( 1%) wall   67389 kB ( 2%) ggc
 CSE                   :  12.53 ( 3%) usr   0.03 ( 0%) sys  12.12 ( 2%) wall   18902 kB ( 1%) ggc
 dead code elimination :   2.47 ( 0%) usr   0.02 ( 0%) sys   2.62 ( 1%) wall       0 kB ( 0%) ggc
 dead store elim1      :   4.12 ( 1%) usr   0.02 ( 0%) sys   4.28 ( 1%) wall   42608 kB ( 1%) ggc
 dead store elim2      :   4.04 ( 1%) usr   0.02 ( 0%) sys   3.77 ( 1%) wall   50827 kB ( 2%) ggc
 loop analysis         :   0.50 ( 0%) usr   0.01 ( 0%) sys   0.45 ( 0%) wall   15084 kB ( 0%) ggc
 loop invariant motion :   1.67 ( 0%) usr   0.01 ( 0%) sys   1.71 ( 0%) wall    1691 kB ( 0%) ggc
 loop unswitching      :   0.59 ( 0%) usr   0.01 ( 0%) sys   0.56 ( 0%) wall     484 kB ( 0%) ggc
 CPROP                 :  10.36 ( 2%) usr   0.04 ( 0%) sys  10.92 ( 2%) wall   93818 kB ( 3%) ggc
 PRE                   :   8.96 ( 2%) usr   0.04 ( 0%) sys   8.89 ( 2%) wall   13722 kB ( 0%) ggc
 code hoisting         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      21 kB ( 0%) ggc
 CSE 2                 :   7.07 ( 1%) usr   0.02 ( 0%) sys   7.28 ( 1%) wall   11711 kB ( 0%) ggc
 branch prediction     :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall      32 kB ( 0%) ggc
 combiner              :  15.08 ( 3%) usr   0.08 ( 1%) sys  15.66 ( 3%) wall  250426 kB ( 8%) ggc
 if-conversion         :   3.01 ( 1%) usr   0.01 ( 0%) sys   2.92 ( 1%) wall   31718 kB ( 1%) ggc
 regmove               :   1.39 ( 0%) usr   0.01 ( 0%) sys   1.53 ( 0%) wall     376 kB ( 0%) ggc
 integrated RA         :  29.25 ( 6%) usr   0.05 ( 1%) sys  28.78 ( 6%) wall  129980 kB ( 4%) ggc
 reload                :  12.15 ( 2%) usr   0.08 ( 1%) sys  12.14 ( 2%) wall   39196 kB ( 1%) ggc
 reload CSE regs       :   8.59 ( 2%) usr   0.00 ( 0%) sys   8.44 ( 2%) wall  110975 kB ( 3%) ggc
 load CSE after reload :   0.96 ( 0%) usr   0.00 ( 0%) sys   1.10 ( 0%) wall     606 kB ( 0%) ggc
 zee                   :   0.85 ( 0%) usr   0.03 ( 0%) sys   0.80 ( 0%) wall     400 kB ( 0%) ggc
 thread pro- & epilogue:   1.98 ( 0%) usr   0.01 ( 0%) sys   1.84 ( 0%) wall   44591 kB ( 1%) ggc
 if-conversion 2       :   0.87 ( 0%) usr   0.01 ( 0%) sys   0.85 ( 0%) wall    8193 kB ( 0%) ggc
 combine stack adjustments:   0.38 ( 0%) usr   0.01 ( 0%) sys   0.36 ( 0%) wall       1 kB ( 0%) ggc
 peephole 2            :   1.35 ( 0%) usr   0.01 ( 0%) sys   1.25 ( 0%) wall   22154 kB ( 1%) ggc
 hard reg cprop        :   2.96 ( 1%) usr   0.03 ( 0%) sys   3.36 ( 1%) wall    2352 kB ( 0%) ggc
 scheduling 2          :  15.25 ( 3%) usr   0.09 ( 1%) sys  15.33 ( 3%) wall    7224 kB ( 0%) ggc
 machine dep reorg     :   2.63 ( 1%) usr   0.00 ( 0%) sys   2.34 ( 0%) wall    2880 kB ( 0%) ggc
 reorder blocks        :   2.58 ( 1%) usr   0.01 ( 0%) sys   2.89 ( 1%) wall   71277 kB ( 2%) ggc
 final                 :   9.27 ( 2%) usr   0.66 ( 7%) sys  10.19 ( 2%) wall  145100 kB ( 5%) ggc
 variable output       :   0.44 ( 0%) usr   0.02 ( 0%) sys   0.45 ( 0%) wall    5092 kB ( 0%) ggc
 symout                :   6.91 ( 1%) usr   0.42 ( 4%) sys   7.34 ( 1%) wall  414781 kB (13%) ggc
 variable tracking     :   8.64 ( 2%) usr   0.05 ( 1%) sys   9.16 ( 2%) wall  192447 kB ( 6%) ggc
 var-tracking dataflow :  24.20 ( 5%) usr   0.05 ( 1%) sys  23.70 ( 5%) wall       0 kB ( 0%) ggc
 var-tracking emit     :  15.30 ( 3%) usr   0.04 ( 0%) sys  14.78 ( 3%) wall  179536 kB ( 6%) ggc
 tree if-combine       :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      71 kB ( 0%) ggc
 uninit var anaysis    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 497.36             9.38           507.43            3172254 kB

So var-tracking dataflow seems to be most expensive part of var tracking but just 50% of the problem.
In expansion, it is the actual RTL expansion, not variable packng nor out-of-ssa code.

Bootstrapped/regtested x86_64-linux, OK?

Honza

	* timevar.def (TV_OUT_OF_SSA, TV_VAR_EXPAND, TV_POST_EXPAND,
	TV_VAR_TRACKING_DATAFLOW, TV_VAR_TRACKING_EMIT): New timevars.
	* cfgexpand.c (gimple_expand_cfg): Use new timevars.
	* var-tracking.c (vt_find_locations, variable_tracking_main_1):
	Likewise.
Diego Novillo - July 3, 2010, 3:54 p.m.
OK.

Diego.

Patch

Index: timevar.def
===================================================================
--- timevar.def	(revision 161774)
+++ timevar.def	(working copy)
@@ -172,7 +172,10 @@  DEFTIMEVAR (TV_DOMINANCE             , "
 DEFTIMEVAR (TV_CONTROL_DEPENDENCES   , "control dependences")
 DEFTIMEVAR (TV_OVERLOAD              , "overload resolution")
 DEFTIMEVAR (TV_TEMPLATE_INSTANTIATION, "template instantiation")
+DEFTIMEVAR (TV_OUT_OF_SSA	     , "out of ssa")
+DEFTIMEVAR (TV_VAR_EXPAND	     , "expand vars")
 DEFTIMEVAR (TV_EXPAND		     , "expand")
+DEFTIMEVAR (TV_POST_EXPAND	     , "post expand cleanups")
 DEFTIMEVAR (TV_VARCONST              , "varconst")
 DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
 DEFTIMEVAR (TV_JUMP                  , "jump")
@@ -226,6 +229,8 @@  DEFTIMEVAR (TV_FINAL                 , "
 DEFTIMEVAR (TV_VAROUT                , "variable output")
 DEFTIMEVAR (TV_SYMOUT                , "symout")
 DEFTIMEVAR (TV_VAR_TRACKING          , "variable tracking")
+DEFTIMEVAR (TV_VAR_TRACKING_DATAFLOW , "var-tracking dataflow")
+DEFTIMEVAR (TV_VAR_TRACKING_EMIT     , "var-tracking emit")
 DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-combine")
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var anaysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 161774)
+++ cfgexpand.c	(working copy)
@@ -3764,7 +3764,9 @@  gimple_expand_cfg (void)
   edge e;
   unsigned i;
 
+  timevar_push (TV_OUT_OF_SSA);
   rewrite_out_of_ssa (&SA);
+  timevar_pop (TV_OUT_OF_SSA);
   SA.partition_to_pseudo = (rtx *)xcalloc (SA.map->num_partitions,
 					   sizeof (rtx));
 
@@ -3807,7 +3809,9 @@  gimple_expand_cfg (void)
 
 
   /* Expand the variables recorded during gimple lowering.  */
+  timevar_push (TV_VAR_EXPAND);
   expand_used_vars ();
+  timevar_pop (TV_VAR_EXPAND);
 
   /* Honor stack protection warnings.  */
   if (warn_stack_protect)
@@ -3887,8 +3891,11 @@  gimple_expand_cfg (void)
     expand_debug_locations ();
 
   execute_free_datastructures ();
+  timevar_push (TV_OUT_OF_SSA);
   finish_out_of_ssa (&SA);
+  timevar_pop (TV_OUT_OF_SSA);
 
+  timevar_push (TV_POST_EXPAND);
   /* We are no longer in SSA form.  */
   cfun->gimple_df->in_ssa_p = false;
 
@@ -3998,6 +4005,7 @@  gimple_expand_cfg (void)
      the common parent easily.  */
   set_block_levels (DECL_INITIAL (cfun->decl), 0);
   default_rtl_profile ();
+  timevar_pop (TV_POST_EXPAND);
   return 0;
 }
 
Index: var-tracking.c
===================================================================
--- var-tracking.c	(revision 161774)
+++ var-tracking.c	(working copy)
@@ -5992,6 +5992,7 @@  vt_find_locations (void)
   int htabmax = PARAM_VALUE (PARAM_MAX_VARTRACK_SIZE);
   bool success = true;
 
+  timevar_push (TV_VAR_TRACKING_DATAFLOW);
   /* Compute reverse completion order of depth first search of the CFG
      so that the data-flow runs faster.  */
   rc_order = XNEWVEC (int, n_basic_blocks - NUM_FIXED_BLOCKS);
@@ -6027,6 +6028,7 @@  vt_find_locations (void)
 	{
 	  bb = (basic_block) fibheap_extract_min (worklist);
 	  RESET_BIT (in_worklist, bb->index);
+	  gcc_assert (!TEST_BIT (visited, bb->index));
 	  if (!TEST_BIT (visited, bb->index))
 	    {
 	      bool changed;
@@ -6179,6 +6181,7 @@  vt_find_locations (void)
   sbitmap_free (in_worklist);
   sbitmap_free (in_pending);
 
+  timevar_pop (TV_VAR_TRACKING_DATAFLOW);
   return success;
 }
 
@@ -8534,7 +8537,9 @@  variable_tracking_main_1 (void)
       dump_flow_info (dump_file, dump_flags);
     }
 
+  timevar_push (TV_VAR_TRACKING_EMIT);
   vt_emit_notes ();
+  timevar_pop (TV_VAR_TRACKING_EMIT);
 
   vt_finalize ();
   vt_debug_insns_local (false);