Patchwork More df_worklist_dataflow improvements

login
register
mail settings
Submitter Jan Hubicka
Date June 22, 2010, 6:30 p.m.
Message ID <20100622183006.GB23996@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/56561/
State New
Headers show

Comments

Jan Hubicka - June 22, 2010, 6:30 p.m.
Hi,
this patch aims to reduce worklist bitmap traffic in df-core.  It reduces
builds times from 9m32s to 9m15s on my setup (average from 3 runs, each run has
less than 10s noise).

The patch adds IN_WORKLIST flag for BB that says that BB is in worklist
(either workslit or pending).  To make lookups fast, I use 1 bit of aux field
used currently for age structure.

The internal loop of solver walks edges and compute confluences and after
updating the transfer function it walks edges again and inserts everything
to worklist.  Those loops all test considered bitmap, that is rather sily.

The first loop consider only BBs with AGE younger than last visit.  The patch
aranges the last visits to start at 1, so AGEs of BBs not considered are all 0
and confluence is never computed on them.

The second loop inserts only things not in worklist yet, so the patch arranges
the IN_WORKLIST flag to be true for all basic blocks initially that makes not
considered blocks to never enter the queues.

The profile now looks as follows:
               :      EXECUTE_IF_SET_IN_BITMAP (worklist, 0, index, bi)
               :        {
               :          unsigned bb_index;
   117  0.0101 :          dcount++;
               :
   470  0.0404 :          bb_index = blocks_in_postorder[index];
  1320  0.1136 :          bb = BASIC_BLOCK (bb_index);
   135  0.0116 :          prev_age = VEC_index (int, last_visit_age, index) + 1;
               :          CLEAR_BB_IN_WORKLIST (bb);
     2 1.7e-04 :          if (dir == DF_FORWARD)
    61  0.0052 :            changed = df_worklist_propagate_forward (dataflow, bb_index,
               :                                                     bbindex_to_postorder,
               :                                                     pending,
               :                                                     prev_age);
               :          else
    21  0.0018 :            changed = df_worklist_propagate_backward (dataflow, bb_index,
               :                                                      bbindex_to_postorder,
               :                                                      pending,
               :                                                      prev_age);
   269  0.0231 :          VEC_replace (int, last_visit_age, index, age++);
   159  0.0137 :          if (changed)
   192  0.0165 :            SET_BB_LAST_CHANGE_AGE (bb, age);
               :        }
    11 9.5e-04 :      bitmap_clear (worklist);
               :    }

for df_worklist_dataflow_doublequeue and 
   28  0.0024 :  bool changed = age == 1;
               :
               :  /*  Calculate <conf_op> of incoming edges.  */
   339  0.0292 :  if (EDGE_COUNT (bb->preds) > 0)
    19  0.0016 :    FOR_EACH_EDGE (e, ei, bb->preds)
               :      {
    78  0.0067 :        if (age <= BB_LAST_CHANGE_AGE (e->src))
   206  0.0177 :          changed |= dataflow->problem->con_fun_n (e);
               :      }
    11 9.5e-04 :  else if (dataflow->problem->con_fun_0)
               :    dataflow->problem->con_fun_0 (bb);
               :
    62  0.0053 :  if (changed
   108  0.0093 :      && dataflow->problem->trans_fun (bb_index))
               :    {
               :      /* The out set of this block has changed.
               :         Propagate to the outgoing blocks.  */
    87  0.0075 :      FOR_EACH_EDGE (e, ei, bb->succs)
   393  0.0338 :        if (!BB_IN_WORKLIST (e->dest))
               :          {
     6 5.2e-04 :            unsigned ob_index = e->dest->index;
               :
    97  0.0083 :            SET_BB_IN_WORKLIST (e->dest);
     9 7.7e-04 :            bitmap_set_bit (pending, bbindex_to_postorder[ob_index]);
for propagation function.

I will experiment with changing the worklist implementation to arrays qsorted
after each iteration, bitmap is not really fitting this purpose well, that
should improve situation just at EXECUTE_IF_SET_IN_BITMAP (that itself is
pretty slow but not visible in oprofile since samples goes to bitmap.h iterator
implementation).

It seems it would make more sense to have the blocks_in_postorder containing BB pointers
rather than indexes too.

I've also reordered the loops for better cache locality.

The profile is as follows:
CPU: AMD64 family10, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000
samples  %        symbol name
47356     4.0743  htab_find_slot_with_hash
16628     1.4306  df_note_compute.42013.3719
14744     1.2685  fast_dce.487977.constprop.555
14451     1.2433  bitmap_set_bit_1
13575     1.1679  bitmap_ior_into
12050     1.0367  df_worklist_dataflow
11446     0.9848  ggc_internal_alloc_stat
11250     0.9679  htab_find_with_hash
10492     0.9027  record_reg_classes.105277.constprop.851.3375

There are reductions in df_worklist_dataflow, bitmap_set_bit1 and bitmap_clear-bit,
all about 0.2% (of overall sample count)

Bootstrapped/regtested x86_64-linux, OK?

Honza

	* df-core.c (SET_BB_LAST_CHANGE_AGE, BB_IN_WORKLIST,
	SET_BB_IN_WORKLIST, CLEAR_BB_IN_WORKLIST): New macros.
	(BB_LAST_CHANGE_AGE): Do not use last bit of aux.
	(df_worklist_propagate_forward, df_worklist_propagate_backward):
	Minimal age is 1;
	do not try to add BB already in worklist; do not use considered bitmap.
	(df_worklist_dataflow_doublequeue): Minimal age is 1; do not clear aux
	fields; do not take conisdered bitmaps, managed BB_IN_WORKLIST flag.
	(df_worklist_dataflow): Do not initialize considered bitmap;
	invalidate unused bbindex_to_postorder only when checking; initialize
	LAST_CHANGE_AGE to 1 on BBs we care about; update call of
	df_worklist_dataflow_doublequeue; clear aux fileds when done.

Patch

Index: df-core.c
===================================================================
--- df-core.c	(revision 161199)
+++ df-core.c	(working copy)
@@ -851,8 +851,26 @@  struct rtl_opt_pass pass_df_finish =
    The general data flow analysis engine.
 ----------------------------------------------------------------------------*/
 
+/* We use bb->aux to store information about age and worklist.  We always inspect BB structure
+   when accessing those, so this leads to good memory locality.
+
+   First bit of bb->aux is used to indicate whther BB is in worklist.  BBs are all having
+   aux 0 and they are initially in worklist, so value of 0 indicate that BB is in worklist
+   and value of 1 indicate that BB is not.  This is because we want to pretend BBs not
+   considered to be in worklist (to avoid inserting them) and this way we save need
+   to initialize their AUX fields.
+
+   Other bits of AUX are used to represent age.  */
+
 /* Return time BB when it was visited for last time.  */
-#define BB_LAST_CHANGE_AGE(bb) ((ptrdiff_t)(bb)->aux)
+#define BB_LAST_CHANGE_AGE(bb) (((ptrdiff_t)(bb)->aux) >> 1)
+/* Set BB's visit time. */
+#define SET_BB_LAST_CHANGE_AGE(bb, age) (bb)->aux = (void *)(ptrdiff_t)((age) << 1 | ((ptrdiff_t)bb->aux & 1))
+/* Return true when BB scheduled for processing, either in worklist or pending.  */
+#define BB_IN_WORKLIST(bb) (!((ptrdiff_t)(bb)->aux & 1))
+/* Set and clear BB in worklist flag.  */
+#define SET_BB_IN_WORKLIST(bb) ((bb)->aux = (void *)((ptrdiff_t)(bb)->aux & ~1))
+#define CLEAR_BB_IN_WORKLIST(bb) ((bb)->aux = (void *)((ptrdiff_t)(bb)->aux | 1))
 
 /* Helper function for df_worklist_dataflow.
    Propagate the dataflow forward.
@@ -876,21 +894,19 @@  df_worklist_propagate_forward (struct da
                                unsigned bb_index,
                                unsigned *bbindex_to_postorder,
                                bitmap pending,
-                               sbitmap considered,
 			       ptrdiff_t age)
 {
   edge e;
   edge_iterator ei;
   basic_block bb = BASIC_BLOCK (bb_index);
-  bool changed = !age;
+  bool changed = age == 1;
 
   /*  Calculate <conf_op> of incoming edges.  */
   if (EDGE_COUNT (bb->preds) > 0)
     FOR_EACH_EDGE (e, ei, bb->preds)
       {
-        if (age <= BB_LAST_CHANGE_AGE (e->src)
-	    && TEST_BIT (considered, e->src->index))
-          changed |= dataflow->problem->con_fun_n (e);
+	if (age <= BB_LAST_CHANGE_AGE (e->src))
+	  changed |= dataflow->problem->con_fun_n (e);
       }
   else if (dataflow->problem->con_fun_0)
     dataflow->problem->con_fun_0 (bb);
@@ -901,12 +917,13 @@  df_worklist_propagate_forward (struct da
       /* The out set of this block has changed.
          Propagate to the outgoing blocks.  */
       FOR_EACH_EDGE (e, ei, bb->succs)
-        {
-          unsigned ob_index = e->dest->index;
-
-          if (TEST_BIT (considered, ob_index))
-            bitmap_set_bit (pending, bbindex_to_postorder[ob_index]);
-        }
+	if (!BB_IN_WORKLIST (e->dest))
+	  {
+	    unsigned ob_index = e->dest->index;
+
+	    SET_BB_IN_WORKLIST (e->dest);
+	    bitmap_set_bit (pending, bbindex_to_postorder[ob_index]);
+	  }
       return true;
     }
   return false;
@@ -921,20 +938,18 @@  df_worklist_propagate_backward (struct d
                                 unsigned bb_index,
                                 unsigned *bbindex_to_postorder,
                                 bitmap pending,
-                                sbitmap considered,
 			        ptrdiff_t age)
 {
   edge e;
   edge_iterator ei;
   basic_block bb = BASIC_BLOCK (bb_index);
-  bool changed = !age;
+  bool changed = age == 1;
 
   /*  Calculate <conf_op> of incoming edges.  */
   if (EDGE_COUNT (bb->succs) > 0)
     FOR_EACH_EDGE (e, ei, bb->succs)
       {
-        if (age <= BB_LAST_CHANGE_AGE (e->dest)
-	    && TEST_BIT (considered, e->dest->index))
+        if (age <= BB_LAST_CHANGE_AGE (e->dest))
           changed |= dataflow->problem->con_fun_n (e);
       }
   else if (dataflow->problem->con_fun_0)
@@ -946,12 +961,13 @@  df_worklist_propagate_backward (struct d
       /* The out set of this block has changed.
          Propagate to the outgoing blocks.  */
       FOR_EACH_EDGE (e, ei, bb->preds)
-        {
-          unsigned ob_index = e->src->index;
-
-          if (TEST_BIT (considered, ob_index))
-            bitmap_set_bit (pending, bbindex_to_postorder[ob_index]);
-        }
+	if (!BB_IN_WORKLIST (e->src))
+	  {
+	    unsigned ob_index = e->src->index;
+
+	    SET_BB_IN_WORKLIST (e->src);
+	    bitmap_set_bit (pending, bbindex_to_postorder[ob_index]);
+	  }
       return true;
     }
   return false;
@@ -979,7 +995,6 @@  df_worklist_propagate_backward (struct d
 static void
 df_worklist_dataflow_doublequeue (struct dataflow *dataflow,
 			  	  bitmap pending,
-                                  sbitmap considered,
                                   int *blocks_in_postorder,
 				  unsigned *bbindex_to_postorder,
 				  int n_blocks)
@@ -987,12 +1002,11 @@  df_worklist_dataflow_doublequeue (struct
   enum df_flow_dir dir = dataflow->problem->dir;
   int dcount = 0;
   bitmap worklist = BITMAP_ALLOC (&df_bitmap_obstack);
-  int age = 0;
+  int age = 1;
   bool changed;
   VEC(int, heap) *last_visit_age = NULL;
   int prev_age;
   basic_block bb;
-  int i;
 
   VEC_safe_grow_cleared (int, heap, last_visit_age, n_blocks);
 
@@ -1013,28 +1027,26 @@  df_worklist_dataflow_doublequeue (struct
 	  unsigned bb_index;
 	  dcount++;
 
-	  bitmap_clear_bit (pending, index);
 	  bb_index = blocks_in_postorder[index];
 	  bb = BASIC_BLOCK (bb_index);
-	  prev_age = VEC_index (int, last_visit_age, index);
+	  prev_age = VEC_index (int, last_visit_age, index) + 1;
+	  CLEAR_BB_IN_WORKLIST (bb);
 	  if (dir == DF_FORWARD)
 	    changed = df_worklist_propagate_forward (dataflow, bb_index,
 						     bbindex_to_postorder,
-						     pending, considered,
+						     pending,
 						     prev_age);
 	  else
 	    changed = df_worklist_propagate_backward (dataflow, bb_index,
 						      bbindex_to_postorder,
-						      pending, considered,
+						      pending,
 						      prev_age);
-	  VEC_replace (int, last_visit_age, index, ++age);
+	  VEC_replace (int, last_visit_age, index, age++);
 	  if (changed)
-	    bb->aux = (void *)(ptrdiff_t)age;
+	    SET_BB_LAST_CHANGE_AGE (bb, age);
 	}
       bitmap_clear (worklist);
     }
-  for (i = 0; i < n_blocks; i++)
-    BASIC_BLOCK (blocks_in_postorder[i])->aux = NULL;
 
   BITMAP_FREE (worklist);
   BITMAP_FREE (pending);
@@ -1063,11 +1075,8 @@  df_worklist_dataflow (struct dataflow *d
                       int n_blocks)
 {
   bitmap pending = BITMAP_ALLOC (&df_bitmap_obstack);
-  sbitmap considered = sbitmap_alloc (last_basic_block);
-  bitmap_iterator bi;
   unsigned int *bbindex_to_postorder;
   int i;
-  unsigned int index;
   enum df_flow_dir dir = dataflow->problem->dir;
 
   gcc_assert (dir != DF_NONE);
@@ -1076,21 +1085,20 @@  df_worklist_dataflow (struct dataflow *d
   bbindex_to_postorder =
     (unsigned int *)xmalloc (last_basic_block * sizeof (unsigned int));
 
+#ifdef ENABLE_CHECKING
   /* Initialize the array to an out-of-bound value.  */
   for (i = 0; i < last_basic_block; i++)
     bbindex_to_postorder[i] = last_basic_block;
-
-  /* Initialize the considered map.  */
-  sbitmap_zero (considered);
-  EXECUTE_IF_SET_IN_BITMAP (blocks_to_consider, 0, index, bi)
-    {
-      SET_BIT (considered, index);
-    }
+#endif
 
   /* Initialize the mapping of block index to postorder.  */
   for (i = 0; i < n_blocks; i++)
     {
       bbindex_to_postorder[blocks_in_postorder[i]] = i;
+      /* Start with Age of 1.  Blocks not considered for dataflow
+         will all have age of 0 that will prevent us from computing
+	 confluence on them.  */
+      SET_BB_LAST_CHANGE_AGE (BASIC_BLOCK (blocks_in_postorder[i]), 1);
       /* Add all blocks to the worklist.  */
       bitmap_set_bit (pending, i);
     }
@@ -1100,11 +1108,14 @@  df_worklist_dataflow (struct dataflow *d
     dataflow->problem->init_fun (blocks_to_consider);
 
   /* Solve it.  */
-  df_worklist_dataflow_doublequeue (dataflow, pending, considered,
+  df_worklist_dataflow_doublequeue (dataflow, pending,
 				    blocks_in_postorder,
 				    bbindex_to_postorder,
 				    n_blocks);
-  sbitmap_free (considered);
+
+  for (i = 0; i < n_blocks; i++)
+    BASIC_BLOCK (blocks_in_postorder[i])->aux = NULL;
+
   free (bbindex_to_postorder);
 }