diff mbox

Improve handling of threads which cross over the current loops header

Message ID 528F0C4E.5050902@redhat.com
State New
Headers show

Commit Message

Jeff Law Nov. 22, 2013, 7:48 a.m. UTC
Right now jump threads from within a loop which cross the loop header, 
then terminate within the loop are ignored as this may create 
irreducible loops.

This patch gets the CFG/SSA updating code in good enough shape to handle 
the case that the embedded guys care about.  ie, the coremark FSA/FSM 
benchmark.

Basically we still ignore most of those threads -- the exception is when 
we find the loop header has a multi-way branch.  In that case we go 
ahead and thread the jump, throw away and rebuild the loop information. 
  Again, we only do this when we're able to remove a multi-way jump at 
the top of a loop.

During a GCC bootstrap I found two instances where this triggers, once 
in GCC itself, one in the java runtime libraries.  I've manually 
verified that we do the right thing in those two instances as well as 
the FSA/FSM optimization.  However, the reality is this code isn't being 
well exercised.

So (for testing purposes only) I removed the test that the loop header 
has to be a multi-way jump and enabled a follow-on patch which allows us 
to identify many more threads through the loop header to points within 
the current loop.  Not surprisingly the code triggered more often and 
uncovered a couple minor issues that I've addressed.

The bootstrap/regression test for the patch ran fine, as did the 
regression and bootstrap test with various pass limitations removed 
(with the exception of tests which explicitly test that we don't thread 
in situations where it'll create irreducible loops).  This is good.

Installed on the trunk.  One more patch to go before calling and end to 
my own development work for 4.9 :-)
* tree-ssa-threadupdate.c: Include tree-cfg.h and tree-pass.h
	(thread_block_1): Do not cancel jump threads which go from
	inside a loop, through the header, then back inside the loop.
	(bb_ends_with_multiway_branch): New function.
	(thread_through_all_blocks): Handle threading cases which start
	in a loop through the loop header to a point in the loop.

Comments

Richard Biener Nov. 22, 2013, 12:10 p.m. UTC | #1
On Fri, Nov 22, 2013 at 8:48 AM, Jeff Law <law@redhat.com> wrote:
>
> Right now jump threads from within a loop which cross the loop header, then
> terminate within the loop are ignored as this may create irreducible loops.
>
> This patch gets the CFG/SSA updating code in good enough shape to handle the
> case that the embedded guys care about.  ie, the coremark FSA/FSM benchmark.
>
> Basically we still ignore most of those threads -- the exception is when we
> find the loop header has a multi-way branch.  In that case we go ahead and
> thread the jump, throw away and rebuild the loop information.  Again, we
> only do this when we're able to remove a multi-way jump at the top of a
> loop.
>
> During a GCC bootstrap I found two instances where this triggers, once in
> GCC itself, one in the java runtime libraries.  I've manually verified that
> we do the right thing in those two instances as well as the FSA/FSM
> optimization.  However, the reality is this code isn't being well exercised.
>
> So (for testing purposes only) I removed the test that the loop header has
> to be a multi-way jump and enabled a follow-on patch which allows us to
> identify many more threads through the loop header to points within the
> current loop.  Not surprisingly the code triggered more often and uncovered
> a couple minor issues that I've addressed.
>
> The bootstrap/regression test for the patch ran fine, as did the regression
> and bootstrap test with various pass limitations removed (with the exception
> of tests which explicitly test that we don't thread in situations where
> it'll create irreducible loops).  This is good.
>
> Installed on the trunk.  One more patch to go before calling and end to my
> own development work for 4.9 :-)
>
>
>
>
>
>
>         * tree-ssa-threadupdate.c: Include tree-cfg.h and tree-pass.h
>         (thread_block_1): Do not cancel jump threads which go from
>         inside a loop, through the header, then back inside the loop.
>         (bb_ends_with_multiway_branch): New function.
>         (thread_through_all_blocks): Handle threading cases which start
>         in a loop through the loop header to a point in the loop.
>
>
>
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index 777fe41..38d4f69 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgloop.h"
>  #include "hash-table.h"
>  #include "dbgcnt.h"
> +#include "tree-cfg.h"
> +#include "tree-pass.h"
>
>  /* Given a block B, update the CFG and SSA graph to reflect redirecting
>     one or more in-edges to B to instead reach the destination of an
> @@ -815,29 +817,15 @@ thread_block_1 (basic_block bb, bool noloop_only, bool
> joiners)
>             }
>
>           /* Another case occurs when trying to thread through our
> -            own loop header, possibly from inside the loop.
> -
> -            If our loop header is buried in the path, then go ahead
> -            and cancel the jump threading request here.  This likely
> -            will need updating for the FSA/FSM coremark case.
> -
> -            Other cases (BB is the loop header) are handled elsewhere.  */
> +            own loop header, possibly from inside the loop.  We will
> +            thread these later.  */
>           unsigned int i;
>           for (i = 1; i < path->length (); i++)
>             {
>               if ((*path)[i]->e->src == bb->loop_father->header
>                   && (!loop_exit_edge_p (bb->loop_father, e2)
>                       || (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK))
> -               {
> -                 /* If i != 1, then it's a buried header that will not
> -                    be handled elsehwere.  */
> -                 if (i != 1)
> -                   {
> -                     delete_jump_thread_path (path);
> -                     e->aux = NULL;
> -                   }
> -                 break;
> -               }
> +               break;
>             }
>
>           if (i != path->length ())
> @@ -1554,6 +1542,20 @@ mark_threaded_blocks (bitmap threaded_blocks)
>  }
>
>
> +/* Return TRUE if BB ends with a switch statement or a computed goto.
> +   Otherwise return false.  */
> +static bool
> +bb_ends_with_multiway_branch (basic_block bb ATTRIBUTE_UNUSED)
> +{
> +  gimple stmt = last_stmt (bb);
> +  if (stmt && gimple_code (stmt) == GIMPLE_SWITCH)
> +    return true;
> +  if (stmt && gimple_code (stmt) == GIMPLE_GOTO
> +      && TREE_CODE (gimple_goto_dest (stmt)) == SSA_NAME)
> +    return true;
> +  return false;
> +}
> +
>  /* Walk through all blocks and thread incoming edges to the appropriate
>     outgoing edge for each edge pair recorded in THREADED_EDGES.
>
> @@ -1573,6 +1575,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
>    bitmap_iterator bi;
>    bitmap threaded_blocks;
>    struct loop *loop;
> +  bool totally_clobbered_loops = false;
>
>    /* We must know about loops in order to preserve them.  */
>    gcc_assert (current_loops != NULL);
> @@ -1609,35 +1612,79 @@ thread_through_all_blocks (bool
> may_peel_loop_headers)
>        retval |= thread_through_loop_header (loop, may_peel_loop_headers);
>      }
>
> -  /* Assume we had a jump thread path which went from the latch to the exit
> -     and a path which goes from outside to inside the same loop.
> -
> -     If the latch to exit was handled first, we will thread it and clear
> -     loop->header.
> -
> -     The second path will be ignored by thread_block because we're going
> -     through a loop header.  It will also be ignored by the loop above
> -     because loop->header is NULL.
> -
> -     This results in the second path never being threaded.  The failure
> -     mode is a dangling AUX field.
> -
> -     This is inherently a bit of a pain to fix, so we just walk all the
> -     blocks and all the incoming edges to those blocks and clear their
> -     AUX fields.  */
> +  /* Any jump threading paths that are still attached to edges at this
> +     point must be one of two cases.
> +
> +     First, we could have a jump threading path which went from outside
> +     a loop to inside a loop that was ignored because a prior jump thread
> +     across a backedge was realized (which indirectly causes the loop
> +     above to ignore the latter thread).  We can detect these because the
> +     loop structures will be different and we do not currently try to
> +     optimize this case.
> +
> +     Second, we could be threading across a backedge to a point within the
> +     same loop.  This occurrs for the FSA/FSM optimization and we would
> +     like to optimize it.  However, we have to be very careful as this
> +     may completely scramble the loop structures, with the result being
> +     irreducible loops causing us to throw away our loop structure.
> +
> +     As a compromise for the latter case, if the thread path ends in
> +     a block where the last statement is a multiway branch, then go
> +     ahead and thread it, else ignore it.  */
>    basic_block bb;
> -  edge_iterator ei;
>    edge e;
>    FOR_EACH_BB (bb)
>      {
> -      FOR_EACH_EDGE (e, ei, bb->preds)
> +      /* If we do end up threading here, we can remove elements from
> +        BB->preds.  Thus we can not use the FOR_EACH_EDGE iterator.  */
> +      for (edge_iterator ei = ei_start (bb->preds);
> +          (e = ei_safe_edge (ei));)
>         if (e->aux)
>           {
>             vec<jump_thread_edge *> *path = THREAD_PATH (e);
>
> -           delete_jump_thread_path (path);
> -           e->aux = NULL;
> +           /* Case 1, threading from outside to inside the loop
> +              after we'd already threaded through the header.  */
> +           if ((*path)[0]->e->dest->loop_father
> +               != path->last ()->e->src->loop_father)
> +             {
> +               delete_jump_thread_path (path);
> +               e->aux = NULL;
> +               ei_next (&ei);
> +             }
> +          else if (bb_ends_with_multiway_branch (path->last ()->e->src))
> +             {
> +               /* The code to thread through loop headers may have
> +                  split a block with jump threads attached to it.
> +
> +                  We can identify this with a disjoint jump threading
> +                  path.  If found, just remove it.  */
> +               for (unsigned int i = 0; i < path->length () - 1; i++)
> +                 if ((*path)[i]->e->dest != (*path)[i + 1]->e->src)
> +                   {
> +                     delete_jump_thread_path (path);
> +                     e->aux = NULL;
> +                     ei_next (&ei);
> +                     break;
> +                   }
> +
> +               /* Our path is still valid, thread it.  */
> +               if (e->aux)
> +                 {
> +                   totally_clobbered_loops
> +                     |= thread_block ((*path)[0]->e->dest, false);
> +                   e->aux = NULL;
> +                 }
> +             }
> +          else
> +             {
> +               delete_jump_thread_path (path);
> +               e->aux = NULL;
> +               ei_next (&ei);
> +             }
>           }
> +       else
> +         ei_next (&ei);
>      }
>
>    statistics_counter_event (cfun, "Jumps threaded",
> @@ -1649,7 +1696,32 @@ thread_through_all_blocks (bool
> may_peel_loop_headers)
>    threaded_blocks = NULL;
>    paths.release ();
>
> -  if (retval)
> +  /* If we made changes to the CFG that might have totally messed
> +     up the loop structure, then drop the old loop structure and
> +     rebuild.  */
> +  if (totally_clobbered_loops)
> +    {
> +      /* Release the current loop structures, they are totally
> +        clobbered at this point.  */
> +      loop_optimizer_finalize ();
> +      current_loops = NULL;

This is definitely a no-go and should be immediately reverted.  If you
wreck a particular loop simply mark it for removal by setting
->header and ->latch to NULL.  Loop fixup will then re-discover
it (or really remove it).

With the code above you discard all loops in the function including
meta-information on openmp loops, #pragma ivdeps info, etc.

Richard.

> +      /* Similarly for dominance information.  */
> +      free_dominance_info (CDI_DOMINATORS);
> +      free_dominance_info (CDI_POST_DOMINATORS);
> +
> +      /* Before we can rebuild the loop structures, we need dominators,
> +        which requires no unreachable code.  So remove unreachable code.
> */
> +      delete_unreachable_blocks ();
> +
> +      /* Now rebuild the loop structures.  */
> +      cfun->curr_properties &= ~PROP_loops;
> +      loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +      cfun->curr_properties |= PROP_loops;
> +      retval = 1;
> +    }

> +
> +  if (retval && current_loops)
>      loops_state_set (LOOPS_NEED_FIXUP);
>
>    return retval;
>
Jeff Law Nov. 22, 2013, 3:48 p.m. UTC | #2
On 11/22/13 05:10, Richard Biener wrote:

>> +  if (totally_clobbered_loops)
>> +    {
>> +      /* Release the current loop structures, they are totally
>> +        clobbered at this point.  */
>> +      loop_optimizer_finalize ();
>> +      current_loops = NULL;
>
> This is definitely a no-go and should be immediately reverted.  If you
> wreck a particular loop simply mark it for removal by setting
> ->header and ->latch to NULL.  Loop fixup will then re-discover
> it (or really remove it).
>
> With the code above you discard all loops in the function including
> meta-information on openmp loops, #pragma ivdeps info, etc.
I realize it discards all that stuff, but that's still the safeest thing 
to do.

If we can selectively discard, that seems like a follow-up item.  The 
loop stuff is new and something I know little about.  Thus I chose the 
safest route.

So the issue here is we can create irreducible regions & new nested 
loops.  Does just setting the header,latch fields for the current loop 
handle those cases?

jeff
Richard Biener Nov. 22, 2013, 3:56 p.m. UTC | #3
Jeff Law <law@redhat.com> wrote:
>On 11/22/13 05:10, Richard Biener wrote:
>
>>> +  if (totally_clobbered_loops)
>>> +    {
>>> +      /* Release the current loop structures, they are totally
>>> +        clobbered at this point.  */
>>> +      loop_optimizer_finalize ();
>>> +      current_loops = NULL;
>>
>> This is definitely a no-go and should be immediately reverted.  If
>you
>> wreck a particular loop simply mark it for removal by setting
>> ->header and ->latch to NULL.  Loop fixup will then re-discover
>> it (or really remove it).
>>
>> With the code above you discard all loops in the function including
>> meta-information on openmp loops, #pragma ivdeps info, etc.
>I realize it discards all that stuff, but that's still the safeest
>thing 
>to do.
>
>If we can selectively discard, that seems like a follow-up item.  The 
>loop stuff is new and something I know little about.  Thus I chose the 
>safest route.

Well.  Safe here is arguably a bit optimistic ;-)

>So the issue here is we can create irreducible regions & new nested 
>loops.  Does just setting the header,latch fields for the current loop 
>handle those cases?

Yes.

Richard.

>jeff
Jeff Law Nov. 22, 2013, 4:29 p.m. UTC | #4
On 11/22/13 08:56, Richard Biener wrote:
>
>> So the issue here is we can create irreducible regions & new nested
>> loops.  Does just setting the header,latch fields for the current loop
>> handle those cases?
>
> Yes.
Good.  I'll take care of it.

Thanks,
Jeff
diff mbox

Patch

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 777fe41..38d4f69 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -35,6 +35,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "hash-table.h"
 #include "dbgcnt.h"
+#include "tree-cfg.h"
+#include "tree-pass.h"
 
 /* Given a block B, update the CFG and SSA graph to reflect redirecting
    one or more in-edges to B to instead reach the destination of an
@@ -815,29 +817,15 @@  thread_block_1 (basic_block bb, bool noloop_only, bool joiners)
 	    }
 
 	  /* Another case occurs when trying to thread through our
-	     own loop header, possibly from inside the loop.
-
-	     If our loop header is buried in the path, then go ahead
-	     and cancel the jump threading request here.  This likely
-	     will need updating for the FSA/FSM coremark case.
-
-	     Other cases (BB is the loop header) are handled elsewhere.  */
+	     own loop header, possibly from inside the loop.  We will
+	     thread these later.  */
 	  unsigned int i;
 	  for (i = 1; i < path->length (); i++)
 	    {
 	      if ((*path)[i]->e->src == bb->loop_father->header
 		  && (!loop_exit_edge_p (bb->loop_father, e2)
 		      || (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK))
-		{
-		  /* If i != 1, then it's a buried header that will not
-		     be handled elsehwere.  */
-		  if (i != 1)
-		    {
-		      delete_jump_thread_path (path);
-		      e->aux = NULL;
-		    }
-		  break;
-		}
+		break;
 	    }
 
 	  if (i != path->length ())
@@ -1554,6 +1542,20 @@  mark_threaded_blocks (bitmap threaded_blocks)
 }
 
 
+/* Return TRUE if BB ends with a switch statement or a computed goto.
+   Otherwise return false.  */
+static bool
+bb_ends_with_multiway_branch (basic_block bb ATTRIBUTE_UNUSED)
+{
+  gimple stmt = last_stmt (bb);
+  if (stmt && gimple_code (stmt) == GIMPLE_SWITCH)
+    return true;
+  if (stmt && gimple_code (stmt) == GIMPLE_GOTO
+      && TREE_CODE (gimple_goto_dest (stmt)) == SSA_NAME)
+    return true;
+  return false;
+}
+
 /* Walk through all blocks and thread incoming edges to the appropriate
    outgoing edge for each edge pair recorded in THREADED_EDGES.
 
@@ -1573,6 +1575,7 @@  thread_through_all_blocks (bool may_peel_loop_headers)
   bitmap_iterator bi;
   bitmap threaded_blocks;
   struct loop *loop;
+  bool totally_clobbered_loops = false;
 
   /* We must know about loops in order to preserve them.  */
   gcc_assert (current_loops != NULL);
@@ -1609,35 +1612,79 @@  thread_through_all_blocks (bool may_peel_loop_headers)
       retval |= thread_through_loop_header (loop, may_peel_loop_headers);
     }
 
-  /* Assume we had a jump thread path which went from the latch to the exit
-     and a path which goes from outside to inside the same loop.
-
-     If the latch to exit was handled first, we will thread it and clear
-     loop->header.
-
-     The second path will be ignored by thread_block because we're going
-     through a loop header.  It will also be ignored by the loop above
-     because loop->header is NULL.
-
-     This results in the second path never being threaded.  The failure
-     mode is a dangling AUX field.
-
-     This is inherently a bit of a pain to fix, so we just walk all the
-     blocks and all the incoming edges to those blocks and clear their
-     AUX fields.  */
+  /* Any jump threading paths that are still attached to edges at this
+     point must be one of two cases.
+
+     First, we could have a jump threading path which went from outside
+     a loop to inside a loop that was ignored because a prior jump thread
+     across a backedge was realized (which indirectly causes the loop
+     above to ignore the latter thread).  We can detect these because the
+     loop structures will be different and we do not currently try to
+     optimize this case.
+
+     Second, we could be threading across a backedge to a point within the
+     same loop.  This occurrs for the FSA/FSM optimization and we would
+     like to optimize it.  However, we have to be very careful as this
+     may completely scramble the loop structures, with the result being
+     irreducible loops causing us to throw away our loop structure.
+
+     As a compromise for the latter case, if the thread path ends in
+     a block where the last statement is a multiway branch, then go
+     ahead and thread it, else ignore it.  */
   basic_block bb;
-  edge_iterator ei;
   edge e;
   FOR_EACH_BB (bb)
     {
-      FOR_EACH_EDGE (e, ei, bb->preds)
+      /* If we do end up threading here, we can remove elements from
+	 BB->preds.  Thus we can not use the FOR_EACH_EDGE iterator.  */
+      for (edge_iterator ei = ei_start (bb->preds);
+	   (e = ei_safe_edge (ei));)
 	if (e->aux)
 	  {
 	    vec<jump_thread_edge *> *path = THREAD_PATH (e);
 
-	    delete_jump_thread_path (path);
-	    e->aux = NULL;
+	    /* Case 1, threading from outside to inside the loop
+	       after we'd already threaded through the header.  */
+	    if ((*path)[0]->e->dest->loop_father
+		!= path->last ()->e->src->loop_father)
+	      {
+		delete_jump_thread_path (path);
+		e->aux = NULL;
+		ei_next (&ei);
+	      }
+	   else if (bb_ends_with_multiway_branch (path->last ()->e->src))
+	      {
+		/* The code to thread through loop headers may have
+		   split a block with jump threads attached to it.
+
+		   We can identify this with a disjoint jump threading
+		   path.  If found, just remove it.  */
+		for (unsigned int i = 0; i < path->length () - 1; i++)
+		  if ((*path)[i]->e->dest != (*path)[i + 1]->e->src)
+		    {
+		      delete_jump_thread_path (path);
+		      e->aux = NULL;
+		      ei_next (&ei);
+		      break;
+		    }
+
+		/* Our path is still valid, thread it.  */
+	        if (e->aux)
+		  {
+		    totally_clobbered_loops
+		      |= thread_block ((*path)[0]->e->dest, false);
+		    e->aux = NULL;
+		  }
+	      }
+	   else
+	      {
+		delete_jump_thread_path (path);
+		e->aux = NULL;
+		ei_next (&ei);
+	      }
  	  }
+	else
+	  ei_next (&ei);
     }
 
   statistics_counter_event (cfun, "Jumps threaded",
@@ -1649,7 +1696,32 @@  thread_through_all_blocks (bool may_peel_loop_headers)
   threaded_blocks = NULL;
   paths.release ();
 
-  if (retval)
+  /* If we made changes to the CFG that might have totally messed
+     up the loop structure, then drop the old loop structure and
+     rebuild.  */
+  if (totally_clobbered_loops)
+    {
+      /* Release the current loop structures, they are totally
+	 clobbered at this point.  */
+      loop_optimizer_finalize ();
+      current_loops = NULL;
+
+      /* Similarly for dominance information.  */
+      free_dominance_info (CDI_DOMINATORS);
+      free_dominance_info (CDI_POST_DOMINATORS);
+
+      /* Before we can rebuild the loop structures, we need dominators,
+	 which requires no unreachable code.  So remove unreachable code.  */
+      delete_unreachable_blocks ();
+
+      /* Now rebuild the loop structures.  */
+      cfun->curr_properties &= ~PROP_loops;
+      loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+      cfun->curr_properties |= PROP_loops;
+      retval = 1;
+    }
+
+  if (retval && current_loops)
     loops_state_set (LOOPS_NEED_FIXUP);
 
   return retval;