diff mbox

0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch

Message ID 4C2A3A29.1070505@codesourcery.com
State New
Headers show

Commit Message

Maxim Kuvyrkov June 29, 2010, 6:23 p.m. UTC
On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
...
> This updated patch corrects the cleaning routine and adds several
> comments to annotate its actions.

Ping.

Also, in case you haven't look at the patch yet, here is yet another 
version with a fix to potential miscompilation of code with EH. 
Otherwise the patch is the same.

A miscompilation can occur due to VBEout sets not filtering out 
expressions that die to due to abnormal control flow, these expression 
are represented in TRANSPOUT set.  This updated version (a) filters out 
VBEout sets with !TRANSPOUT and (b) adds a check to 
hoist_expr_reaches_here_p() that accounts for TRANSPOUT.  Previously, 
the check in hoist_code() would suffice because we never looked too far 
down the CFG.

Bootstrapped and regtested on {x86_64,i686,arm}-linux-gnu.

OK to check in?

Comments

Steven Bosscher June 29, 2010, 9:08 p.m. UTC | #1
On Tue, Jun 29, 2010 at 8:23 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
> ...
>>
>> This updated patch corrects the cleaning routine and adds several
>> comments to annotate its actions.
>
> Ping.
>
> Also, in case you haven't look at the patch yet, here is yet another version
> with a fix to potential miscompilation of code with EH. Otherwise the patch
> is the same.
>
> A miscompilation can occur due to VBEout sets not filtering out expressions
> that die to due to abnormal control flow, these expression are represented
> in TRANSPOUT set.  This updated version (a) filters out VBEout sets with
> !TRANSPOUT and (b) adds a check to hoist_expr_reaches_here_p() that accounts
> for TRANSPOUT.  Previously, the check in hoist_code() would suffice because
> we never looked too far down the CFG.

The ideal patch would remove TRANSPOUT and clean out those expressions
earlier, see what RTL PRE does.

If you clean that up first, this latest version 0003 patch will
probably look better/simpler.

Ciao!
Steven
Maxim Kuvyrkov June 30, 2010, 6:23 a.m. UTC | #2
On 6/30/10 1:08 AM, Steven Bosscher wrote:
...
> The ideal patch would remove TRANSPOUT and clean out those expressions
> earlier, see what RTL PRE does.

Removing TRANSPOUT would be a separate change and there's no need to 
mash it together with this patch.

Thanks,
Steven Bosscher June 30, 2010, 11:24 a.m. UTC | #3
On Wed, Jun 30, 2010 at 8:23 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> On 6/30/10 1:08 AM, Steven Bosscher wrote:
> ...
>>
>> The ideal patch would remove TRANSPOUT and clean out those expressions
>> earlier, see what RTL PRE does.
>
> Removing TRANSPOUT would be a separate change and there's no need to mash it
> together with this patch.

Indeed. This is why I said:

"If you clean that up first, this latest version 0003 patch will
probably look better/simpler."

Note the word "first".

But it's only a suggestion. It just seems to me that doing this would
simplify the verification of the implementation of your new algorithm
a bit easier. And that'd be a good thing...

Ciao!
Steven
Jeff Law June 30, 2010, 3:07 p.m. UTC | #4
On 06/29/10 12:23, Maxim Kuvyrkov wrote:
> On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
> ...
>> This updated patch corrects the cleaning routine and adds several
>> comments to annotate its actions.
>
> Ping.
>
> Also, in case you haven't look at the patch yet, here is yet another 
> version with a fix to potential miscompilation of code with EH. 
> Otherwise the patch is the same.
>
> A miscompilation can occur due to VBEout sets not filtering out 
> expressions that die to due to abnormal control flow, these expression 
> are represented in TRANSPOUT set.  This updated version (a) filters 
> out VBEout sets with !TRANSPOUT and (b) adds a check to 
> hoist_expr_reaches_here_p() that accounts for TRANSPOUT.  Previously, 
> the check in hoist_code() would suffice because we never looked too 
> far down the CFG.
But aren't we just moving an expression that we were going to evaluate 
anyway from one block to another without splitting edges?  Doesn't that 
avoid the problems commonly associated with code motion with abnormal edges?

What am I missing?

jeff
Jeff Law June 30, 2010, 3:11 p.m. UTC | #5
On 06/29/10 15:08, Steven Bosscher wrote:
> On Tue, Jun 29, 2010 at 8:23 PM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>    
>> On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
>> ...
>>      
>>> This updated patch corrects the cleaning routine and adds several
>>> comments to annotate its actions.
>>>        
>> Ping.
>>
>> Also, in case you haven't look at the patch yet, here is yet another version
>> with a fix to potential miscompilation of code with EH. Otherwise the patch
>> is the same.
>>
>> A miscompilation can occur due to VBEout sets not filtering out expressions
>> that die to due to abnormal control flow, these expression are represented
>> in TRANSPOUT set.  This updated version (a) filters out VBEout sets with
>> !TRANSPOUT and (b) adds a check to hoist_expr_reaches_here_p() that accounts
>> for TRANSPOUT.  Previously, the check in hoist_code() would suffice because
>> we never looked too far down the CFG.
>>      
> The ideal patch would remove TRANSPOUT and clean out those expressions
> earlier, see what RTL PRE does.
>
> If you clean that up first, this latest version 0003 patch will
> probably look better/simpler.
>    
If there is a real issue here, the PRE approach is required for 
correctness.  TRANSPOUT misses a boatload of stuff (Integer DIV/MOD, 
most FP operations, etc).

What I'm struggling with is why we need this hair at all for hoisting 
since we're not inserting on edges.

jeff
Steven Bosscher June 30, 2010, 3:26 p.m. UTC | #6
On Wed, Jun 30, 2010 at 5:11 PM, Jeff Law <law@redhat.com> wrote:
> If there is a real issue here, the PRE approach is required for correctness.
>  TRANSPOUT misses a boatload of stuff (Integer DIV/MOD, most FP operations,
> etc).
>
> What I'm struggling with is why we need this hair at all for hoisting since
> we're not inserting on edges.

IIRC it has something to do with being unable to fixup the CFG if we
move a potentially throwing expression.

Ciao!
Steven
Jeff Law June 30, 2010, 3:29 p.m. UTC | #7
On 06/30/10 09:26, Steven Bosscher wrote:
> On Wed, Jun 30, 2010 at 5:11 PM, Jeff Law<law@redhat.com>  wrote:
>    
>> If there is a real issue here, the PRE approach is required for correctness.
>>   TRANSPOUT misses a boatload of stuff (Integer DIV/MOD, most FP operations,
>> etc).
>>
>> What I'm struggling with is why we need this hair at all for hoisting since
>> we're not inserting on edges.
>>      
> IIRC it has something to do with being unable to fixup the CFG if we
> move a potentially throwing expression.
>    
But there aren't any fixups to do -- the fixups are necessary when we 
insert on an abnormal edge.  For hoisting that never happens.

jeff
diff mbox

Patch

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 135c0c2..1bf192d 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -854,6 +854,8 @@  extern VEC (basic_block, heap) *get_dominated_by (enum cdi_direction, basic_bloc
 extern VEC (basic_block, heap) *get_dominated_by_region (enum cdi_direction,
 							 basic_block *,
 							 unsigned);
+extern VEC (basic_block, heap) *get_dominated_to_depth (enum cdi_direction,
+							basic_block, int);
 extern VEC (basic_block, heap) *get_all_dominated_blocks (enum cdi_direction,
 							  basic_block);
 extern void add_to_dominance_info (enum cdi_direction, basic_block);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3d576bf..83d019c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8232,6 +8232,12 @@  an expression can travel.  This is currently supported only in
 code hoisting pass.  The lesser the cost, the more aggressive code hoisting
 will be.  The default value is 3.
 
+@item max-hoist-depth
+The depth of search in the dominator tree for expressions to hoist.
+This is used to avoid quadratic behavior in hoisting algorithm.
+The value of 0 will avoid limiting the search, but may slow down compilation
+of huge functions.  The default value is 30.
+
 @item max-unrolled-insns
 The maximum number of instructions that a loop should have if that loop
 is unrolled, and if the loop is unrolled, it determines how many times
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c2dcf0..7861439 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -783,16 +783,20 @@  get_dominated_by_region (enum cdi_direction dir, basic_block *region,
 }
 
 /* Returns the list of basic blocks including BB dominated by BB, in the
-   direction DIR.  The vector will be sorted in preorder.  */
+   direction DIR up to DEPTH in the dominator tree.  The DEPTH of zero will
+   produce a vector containing all dominated blocks.  The vector will be sorted
+   in preorder.  */
 
 VEC (basic_block, heap) *
-get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+get_dominated_to_depth (enum cdi_direction dir, basic_block bb, int depth)
 {
   VEC(basic_block, heap) *bbs = NULL;
   unsigned i;
+  unsigned next_level_start;
 
   i = 0;
   VEC_safe_push (basic_block, heap, bbs, bb);
+  next_level_start = 1; /* = VEC_length (basic_block, bbs); */
 
   do
     {
@@ -803,12 +807,24 @@  get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
 	   son;
 	   son = next_dom_son (dir, son))
 	VEC_safe_push (basic_block, heap, bbs, son);
+
+      if (i == next_level_start && --depth)
+	next_level_start = VEC_length (basic_block, bbs);
     }
-  while (i < VEC_length (basic_block, bbs));
+  while (i < next_level_start);
 
   return bbs;
 }
 
+/* Returns the list of basic blocks including BB dominated by BB, in the
+   direction DIR.  The vector will be sorted in preorder.  */
+
+VEC (basic_block, heap) *
+get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+{
+  return get_dominated_to_depth (dir, bb, 0);
+}
+
 /* Redirect all edges pointing to BB to TO.  */
 void
 redirect_immediate_dominators (enum cdi_direction dir, basic_block bb,
diff --git a/gcc/gcse.c b/gcc/gcse.c
index d734fa4..d00a788 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4219,6 +4219,7 @@  compute_code_hoist_vbeinout (void)
 {
   int changed, passes;
   basic_block bb;
+  sbitmap tmp1, tmp2;
 
   sbitmap_vector_zero (hoist_vbeout, last_basic_block);
   sbitmap_vector_zero (hoist_vbein, last_basic_block);
@@ -4235,8 +4236,14 @@  compute_code_hoist_vbeinout (void)
       FOR_EACH_BB_REVERSE (bb)
 	{
 	  if (bb->next_bb != EXIT_BLOCK_PTR)
-	    sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
-					   hoist_vbein, bb->index);
+	    {
+	      sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
+					     hoist_vbein, bb->index);
+
+	      /* Remove from VBEout expressions that die right after BB.  */
+	      sbitmap_a_and_b (hoist_vbeout[bb->index],
+			       hoist_vbeout[bb->index], transpout[bb->index]);
+	    }
 
 	  changed |= sbitmap_a_or_b_and_c_cg (hoist_vbein[bb->index],
 					      antloc[bb->index],
@@ -4248,7 +4255,144 @@  compute_code_hoist_vbeinout (void)
     }
 
   if (dump_file)
-    fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
+    {
+      fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbein (%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbein[bb->index]);
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
+
+  /* Now cleanup VBEout to avoid moving expressions too far up.
+
+     We follow two rules to clean up VBEout[BB]:
+
+     1. If BB does not have any dominated blocks, nothing will ever be hoisted
+     to BB, so we can just wipe its VBEout clean.
+
+     2. If an expression can be hoisted both to BB and to a *single* successor
+     of BB in the dominator tree, then there is no point of hoisting
+     the expression to BB over BB's successor.  Doing otherwise would
+     unnecessarily extend live ranges.  */
+
+  /* Wipe VBEout of leaf blocks in the dominator tree.  */
+  FOR_EACH_BB (bb)
+    if (first_dom_son (CDI_DOMINATORS, bb) == NULL)
+      sbitmap_zero (hoist_vbeout[bb->index]);
+
+  tmp1 = sbitmap_alloc (expr_hash_table.n_elems);
+  tmp2 = sbitmap_alloc (expr_hash_table.n_elems);
+
+  /* We cannot cleanup VBEout in a single traversal.  There has to be both
+     upward and downward links when computing VBEout of current block to
+     avoid removing bits that shouldn't be removed.  E.g., consider
+     the following dominator tree; '*' marks blocks which compute same
+     expression, the expression can be freely moved; the expected result
+     is that we move computations of '*' from (3) and (6) to (2).
+
+       2
+      / \
+     3*  4
+        / \
+       5   6*
+
+     A walk over the above tree considering only downward links will first
+     remove '*' from VBEout[4] [as there's no point of hoisting
+     the expression to (4) if it's not computed in both (5) and (6)].
+     Then, when processing VBEout[2]. we won't see '*' as needed in (4),
+     so '*' will be removed from VBEout[2] too, leaving a copy of '*' in (3).
+
+     Therefore, we use iterative algorithm with both upward and downward
+     links to solve this problem.  The algorithm obviously converges as at
+     each iteration we make VBEout sets only smaller.  */
+
+  passes = 0;
+  changed = 1;
+
+  while (changed)
+    {
+      changed = 0;
+
+      FOR_EACH_BB (bb)
+        {
+	  basic_block son;
+	  bool first_p = true;
+
+	  /* Walk through dominated blocks and calculate the set of expressions
+	     that are needed in any one, and only one, of the blocks.
+	     TMP1 is the basis of what we want to remove from VBEout[BB].  */
+	  for (son = first_dom_son (CDI_DOMINATORS, bb);
+	       son != NULL;
+	       son = next_dom_son (CDI_DOMINATORS, son))
+	    {
+	      if (first_p)
+		{
+		  sbitmap_copy (tmp1, hoist_vbeout[son->index]);
+		  first_p = false;
+		}
+	      else
+		sbitmap_difference (tmp1, tmp1, hoist_vbeout[son->index]);
+	    }
+
+	  if (!first_p)
+	    {
+	      /* Now trim TMP1 to avoid removing too much.  */
+
+	      if (bb->prev_bb != ENTRY_BLOCK_PTR)
+		/* Remove epxressions from TMP1 that are needed upwards.
+		   These are VBEout[parent] minus expressions that are
+		   killed in BB (and, hence, don't get to VBEout[parent] from
+		   BB).  */
+		{
+		  basic_block parent;
+
+		  parent = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+		  /* Expressions killed in BB.  */
+		  sbitmap_not (tmp2, transp[bb->index]);
+
+		  /* Expressions that reach from PARENT to the end of BB.  */
+		  sbitmap_difference (tmp2, hoist_vbeout[parent->index],
+				      tmp2);
+
+		  sbitmap_difference (tmp1, tmp1, tmp2);
+		}
+
+	      /* Never remove any of expressions computed in BB from
+		 VBEout[BB].  */
+	      sbitmap_difference (tmp1, tmp1, comp[bb->index]);
+
+	      if (sbitmap_any_common_bits (hoist_vbeout[bb->index], tmp1))
+		/* There is at least one bit that can be removed from
+		   VBEout[BB].  */
+		{
+		  sbitmap_difference (hoist_vbeout[bb->index],
+				      hoist_vbeout[bb->index], tmp1);
+		  changed = 1;
+		}
+	    }
+	}
+
+      passes++;
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "hoisting vbeout cleanup: %d passes\n", passes);
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
+
+  sbitmap_free (tmp1);
+  sbitmap_free (tmp2);
 }
 
 /* Top level routine to do the dataflow analysis needed by code hoisting.  */
@@ -4258,8 +4402,8 @@  compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
   compute_transpout ();
-  compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
+  compute_code_hoist_vbeinout ();
   if (dump_file)
     fprintf (dump_file, "\n");
 }
@@ -4311,11 +4455,12 @@  hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
 
       if (pred->src == ENTRY_BLOCK_PTR)
 	break;
-      else if (pred_bb == expr_bb)
-	continue;
       else if (visited[pred_bb->index])
 	continue;
-
+      else if (!TEST_BIT (transpout[pred_bb->index], expr_index))
+	break;
+      else if (pred_bb == expr_bb)
+	continue;
       /* Does this predecessor generate this expression?  */
       else if (TEST_BIT (comp[pred_bb->index], expr_index))
 	break;
@@ -4403,15 +4548,15 @@  hoist_code (void)
       int found = 0;
       int insn_inserted_p;
 
-      domby = get_dominated_by (CDI_DOMINATORS, bb);
+      domby = get_dominated_to_depth (CDI_DOMINATORS, bb, MAX_HOIST_DEPTH);
+
       /* Examine each expression that is very busy at the exit of this
 	 block.  These are the potentially hoistable expressions.  */
       for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
 	{
 	  int hoistable = 0;
 
-	  if (TEST_BIT (hoist_vbeout[bb->index], i)
-	      && TEST_BIT (transpout[bb->index], i))
+	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
 	      /* We've found a potentially hoistable expression, now
 		 we look at every block BB dominates to see if it
@@ -4438,6 +4583,14 @@  hoist_code (void)
 		      occr = find_occr_in_bb (expr->antic_occr, dominated);
 		      gcc_assert (occr);
 
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
 		      gcc_assert (NONDEBUG_INSN_P (occr->insn));
 
 		      /* Adjust MAX_DISTANCE to account for the fact that
@@ -4516,6 +4669,14 @@  hoist_code (void)
 		      occr = find_occr_in_bb (expr->antic_occr, dominated);
 		      gcc_assert (occr);
 
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
 		      gcc_assert (NONDEBUG_INSN_P (occr->insn));
 
 		      /* Adjust MAX_DISTANCE to account for the fact that
@@ -4546,6 +4707,14 @@  hoist_code (void)
 			  gcc_assert (occr);
 			}
 
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
 		      insn = occr->insn;
 		      set = single_set (insn);
 		      gcc_assert (set);
diff --git a/gcc/params.def b/gcc/params.def
index 551e8e2..22737a0 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -240,6 +240,14 @@  DEFPARAM(PARAM_GCSE_UNRESTRICTED_COST,
 	 "Cost at which GCSE optimizations will not constraint the distance an expression can travel",
 	 3, 0, 0)
 
+/* How deep from a given basic block the dominator tree should be searched
+   for expressions to hoist to the block.  The value of 0 will avoid limiting
+   the search.  */
+DEFPARAM(PARAM_MAX_HOIST_DEPTH,
+	 "max-hoist-depth",
+	 "Maximum depth of search in the dominator tree for expressions to hoist",
+	 30, 0, 0)
+
 /* This parameter limits the number of insns in a loop that will be unrolled,
    and by how much the loop is unrolled.
 
diff --git a/gcc/params.h b/gcc/params.h
index 174edc1..aa96c81 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -129,6 +129,8 @@  typedef enum compiler_param
   PARAM_VALUE (PARAM_GCSE_COST_DISTANCE_RATIO)
 #define GCSE_UNRESTRICTED_COST \
   PARAM_VALUE (PARAM_GCSE_UNRESTRICTED_COST)
+#define MAX_HOIST_DEPTH \
+  PARAM_VALUE (PARAM_MAX_HOIST_DEPTH)
 #define MAX_UNROLLED_INSNS \
   PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)
 #define MAX_SMS_LOOP_NUMBER \