diff mbox

Moving backwards/FSM threader into its own pass

Message ID 40ef7284-eb9d-29c7-3af0-05d5de81e602@redhat.com
State New
Headers show

Commit Message

Jeff Law May 27, 2016, 4:30 p.m. UTC
It's been my plan since finally wrapping my head around Bodik's thesis 
to revamp how we handle jump threading to use some of the principles 
from his thesis.  In particular, the back substitution and 
simplification model feels like the right long term direction.

Sebastian's FSM threader was the first step on that path (gcc-5). 
Exploiting that threader for more than just FSM loops was the next big 
step (gcc-6).

This patch takes the next step -- dis-entangling that new jump threading 
code from the old threading code and VRP/DOM.

The key thing to realize here is that the backwards (FSM) jump threader 
does not inherently need the DOM tables nor the ASSERT_EXPRs from VRP to 
do its job.  ie, it can and should run completely independently of 
DOM/VRP (though one day it may exploit range information that a prior 
VRP pass has computed).

By moving the backwards threader into its own pass, we can run it prior 
to DOM/VRP, which allow DOM/VRP to work on a simpler CFG with larger 
extended basic blocks.

The removal of unexecutable paths before VRP also has the nice effect of 
also eliminating false positive warnings for some work Aldy is doing 
around out-of-bound array index warnings.

We can remove all the calls to the backwards threader from the old style 
threader.  The way the FSM bits wired into the old threader caused 
redundant path evaluations.  That can be easily fixed with the FSM bits 
in their own pass.  The net is a 25% reduction in paths examined by the 
FSM threader.

Finally, we ultimately end up threading more jumps.  I don't have the #s 
handy anymore, but when I ran this through my tester there was a clear 
decrease in the number of runtime jumps.

So what are the downsides.

With the threader in its own pass, we end up getting more calls into the 
CFG & SSA verification routines in a checking-enabled compiler.  So the 
compile-time improvement is lost for a checking-enabled compiler.

The backward threader does not combine identical jump threading paths 
with different starting edges into a single jump threading path with 
multiple entry points.  This is primarily a codesize issue, but can have 
a secondary effect on performance.  I know how to fix this and it's on 
the list for gcc-7 along with further cleanups.


Bootstrapped and regression tested on x86_64 linux.  Installing on the 
trunk momentarily.

Jeff
commit 35bd646a4834a68a49af9ccb5873362a0fc742ae
Author: Jeff Law <law@redhat.com>
Date:   Fri May 27 10:23:54 2016 -0600

    	* tree-ssa-threadedge.c: Remove include of tree-ssa-threadbackward.h.
    	(thread_across_edge): Remove calls to find_jump_threads_backwards.
    	* passes.def: Add jump threading passes before DOM/VRP.
    	* tree-ssa-threadbackward.c (find_jump_threads_backwards): Change
    	argument to a basic block from an edge.  Remove tests which are
    	handled elsewhere.
    	(pass_data_thread_jumps, class pass_thread_jumps): New.
    	(pass_thread_jumps::gate, pass_thread_jumps::execute): New.
    	(make_pass_thread_jumps): Likewise.
    	* tree-pass.h (make_pass_thread_jumps): Declare.
    
    	* gcc.dg/tree-ssa/pr21417.c: Update expected output.
    	* gcc.dg/tree-ssa/pr66752-3.c: Likewise.
    	* gcc.dg/tree-ssa/pr68198.c: Likewise.
    	* gcc.dg/tree-ssa/pr69196-1.c: Likewise.
    	* gcc.dg/tree-ssa/pr69270-3.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-thread-2g.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-thread-2h.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-thread-13.c: Likewise.
    	* gcc.dg/tree-ssa/vrp56.c: Likewise.

Comments

Richard Biener May 30, 2016, 9:16 a.m. UTC | #1
On Fri, May 27, 2016 at 6:30 PM, Jeff Law <law@redhat.com> wrote:
>
> It's been my plan since finally wrapping my head around Bodik's thesis to
> revamp how we handle jump threading to use some of the principles from his
> thesis.  In particular, the back substitution and simplification model feels
> like the right long term direction.
>
> Sebastian's FSM threader was the first step on that path (gcc-5). Exploiting
> that threader for more than just FSM loops was the next big step (gcc-6).
>
> This patch takes the next step -- dis-entangling that new jump threading
> code from the old threading code and VRP/DOM.
>
> The key thing to realize here is that the backwards (FSM) jump threader does
> not inherently need the DOM tables nor the ASSERT_EXPRs from VRP to do its
> job.  ie, it can and should run completely independently of DOM/VRP (though
> one day it may exploit range information that a prior VRP pass has
> computed).
>
> By moving the backwards threader into its own pass, we can run it prior to
> DOM/VRP, which allow DOM/VRP to work on a simpler CFG with larger extended
> basic blocks.

Ok, but the placement (and number of) threading passes then no longer depends
on DOM/VRP passes - and as you placed the threading passes _before_ those
passes the threading itself does not benefit from DOM/VRP but only from
previous optimization passes.

I see this as opportunity to remove some of them ;)  I now see in the main
optimization pipeline

      NEXT_PASS (pass_fre);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);

position makes sense - FRE removed redundancies and fully copy/constant
propagated the IL.

      NEXT_PASS (pass_sra);
      /* The dom pass will also resolve all __builtin_constant_p calls
         that are still there to 0.  This has to be done after some
         propagations have already run, but before some more dead code
         is removed, and this place fits nicely.  Remember this when
         trying to move or duplicate pass_dominator somewhere earlier.  */
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);

this position OTOH doesn't make much sense as IL cleanup is missing
after SRA and previous opts.  After loop we now have

      NEXT_PASS (pass_tracer);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
      NEXT_PASS (pass_strlen);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);

I don't think we want two threadings so close together.  It makes some sense
to have a threading _after_ DOM but before VRP (DOM cleaned up the IL).

So that would leave two from your four passes and expose the opportunity
to re-add one during early-opts, also after FRE.  That one should be
throttled down to operate in "-Os" mode though.

So, can you see what removing the two threading passes that don't make
sense to me do to your statistics?  And check whether a -Os-like threading
can be done early?

Thanks,
Richard.

> The removal of unexecutable paths before VRP also has the nice effect of
> also eliminating false positive warnings for some work Aldy is doing around
> out-of-bound array index warnings.
>
> We can remove all the calls to the backwards threader from the old style
> threader.  The way the FSM bits wired into the old threader caused redundant
> path evaluations.  That can be easily fixed with the FSM bits in their own
> pass.  The net is a 25% reduction in paths examined by the FSM threader.
>
> Finally, we ultimately end up threading more jumps.  I don't have the #s
> handy anymore, but when I ran this through my tester there was a clear
> decrease in the number of runtime jumps.
>
> So what are the downsides.
>
> With the threader in its own pass, we end up getting more calls into the CFG
> & SSA verification routines in a checking-enabled compiler.  So the
> compile-time improvement is lost for a checking-enabled compiler.
>
> The backward threader does not combine identical jump threading paths with
> different starting edges into a single jump threading path with multiple
> entry points.  This is primarily a codesize issue, but can have a secondary
> effect on performance.  I know how to fix this and it's on the list for
> gcc-7 along with further cleanups.
>
>
> Bootstrapped and regression tested on x86_64 linux.  Installing on the trunk
> momentarily.
>
> Jeff
>
> commit 35bd646a4834a68a49af9ccb5873362a0fc742ae
> Author: Jeff Law <law@redhat.com>
> Date:   Fri May 27 10:23:54 2016 -0600
>
>         * tree-ssa-threadedge.c: Remove include of
> tree-ssa-threadbackward.h.
>         (thread_across_edge): Remove calls to find_jump_threads_backwards.
>         * passes.def: Add jump threading passes before DOM/VRP.
>         * tree-ssa-threadbackward.c (find_jump_threads_backwards): Change
>         argument to a basic block from an edge.  Remove tests which are
>         handled elsewhere.
>         (pass_data_thread_jumps, class pass_thread_jumps): New.
>         (pass_thread_jumps::gate, pass_thread_jumps::execute): New.
>         (make_pass_thread_jumps): Likewise.
>         * tree-pass.h (make_pass_thread_jumps): Declare.
>
>         * gcc.dg/tree-ssa/pr21417.c: Update expected output.
>         * gcc.dg/tree-ssa/pr66752-3.c: Likewise.
>         * gcc.dg/tree-ssa/pr68198.c: Likewise.
>         * gcc.dg/tree-ssa/pr69196-1.c: Likewise.
>         * gcc.dg/tree-ssa/pr69270-3.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dom-thread-2g.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dom-thread-2h.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dom-thread-13.c: Likewise.
>         * gcc.dg/tree-ssa/vrp56.c: Likewise.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index f04d26d..40bac96 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,16 @@
> +2016-05-26  Jeff Law  <law@redhat.com>
> +
> +       * tree-ssa-threadedge.c: Remove include of
> tree-ssa-threadbackward.h.
> +       (thread_across_edge): Remove calls to find_jump_threads_backwards.
> +       * passes.def: Add jump threading passes before DOM/VRP.
> +       * tree-ssa-threadbackward.c (find_jump_threads_backwards): Change
> +       argument to a basic block from an edge.  Remove tests which are
> +       handled elsewhere.
> +       (pass_data_thread_jumps, class pass_thread_jumps): New.
> +       (pass_thread_jumps::gate, pass_thread_jumps::execute): New.
> +       (make_pass_thread_jumps): Likewise.
> +       * tree-pass.h (make_pass_thread_jumps): Declare.
> +
>  2016-05-27  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * config/visium/visium-protos.h (split_double_move): Rename into...
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 993ed28..3647e90 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -199,6 +199,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_return_slot);
>        NEXT_PASS (pass_fre);
>        NEXT_PASS (pass_merge_phi);
> +      NEXT_PASS (pass_thread_jumps);
>        NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
>        NEXT_PASS (pass_chkp_opt);
>        NEXT_PASS (pass_dce);
> @@ -218,6 +219,7 @@ along with GCC; see the file COPYING3.  If not see
>          propagations have already run, but before some more dead code
>          is removed, and this place fits nicely.  Remember this when
>          trying to move or duplicate pass_dominator somewhere earlier.  */
> +      NEXT_PASS (pass_thread_jumps);
>        NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
>        /* At this point the majority of const/copy propagations
>          are exposed.  Go ahead and identify paths that should never
> @@ -305,8 +307,10 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_strength_reduction);
>        NEXT_PASS (pass_split_paths);
>        NEXT_PASS (pass_tracer);
> +      NEXT_PASS (pass_thread_jumps);
>        NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
>        NEXT_PASS (pass_strlen);
> +      NEXT_PASS (pass_thread_jumps);
>        NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
>        /* The only const/copy propagation opportunities left after
>          DOM and VRP should be due to degenerate PHI nodes.  So rather than
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 00c2a99..5ded2d4 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,19 @@
> +2016-05-26  Jeff Law  <law@redhat.com>
> +
> +       * gcc.dg/tree-ssa/pr21417.c: Update expected output.
> +       * gcc.dg/tree-ssa/pr66752-3.c: Likewise.
> +       * gcc.dg/tree-ssa/pr68198.c: Likewise.
> +       * gcc.dg/tree-ssa/pr69196-1.c: Likewise.
> +       * gcc.dg/tree-ssa/pr69270-3.c: Likewise.
> +       * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
> +       * gcc.dg/tree-ssa/ssa-dom-thread-2g.c: Likewise.
> +       * gcc.dg/tree-ssa/ssa-dom-thread-2h.c: Likewise.
> +       * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Likewise.
> +       * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Likewise.
> +       * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Likewise.
> +       * gcc.dg/tree-ssa/ssa-dom-thread-13.c: Likewise.
> +       * gcc.dg/tree-ssa/vrp56.c: Likewise.
> +
>  2016-05-27  Ville Voutilainen  <ville.voutilainen@gmail.com>
>
>         PR c++/69855
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c
> index c865ee3..4845119 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-dom3-details" } */
> +/* { dg-options "-O2 -fdump-tree-thread4-details" } */
>
>  struct tree_common
>  {
> @@ -49,5 +49,5 @@ L23:
>  /* We should thread the backedge to the top of the loop; ie we only
>     execute the if (expr->common.code != 142) test once per loop
>     iteration.  */
> -/* { dg-final { scan-tree-dump-times "FSM jump thread" 1 "dom3" } } */
> +/* { dg-final { scan-tree-dump-times "FSM jump thread" 1 "thread4" } } */
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
> index 2949cbb..896c8bf 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
>
>  extern int status, pt;
>  extern int count;
> @@ -32,9 +32,10 @@ foo (int N, int c, int b, int *a)
>     pt--;
>  }
>
> -/* There are 3 FSM jump threading opportunities, all of which will be
> +/* There are 4 FSM jump threading opportunities, all of which will be
>     realized, which will eliminate testing of FLAG, completely.  */
> -/* { dg-final { scan-tree-dump-times "Registering FSM" 3 "vrp1"} } */
> +/* { dg-final { scan-tree-dump-times "Registering FSM" 4 "thread1"} } */
>
> -/* There should be no assignments or references to FLAG.  */
> -/* { dg-final { scan-tree-dump-not "flag" "optimized"} } */
> +/* There should be no assignments or references to FLAG, verify they're
> +   eliminated as early as possible.  */
> +/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
> index 227ffeb..d678dc8 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details" } */
>
>  extern void abort (void);
>
> @@ -39,5 +39,5 @@ c_finish_omp_clauses (tree clauses)
>
>  /* There are 3 FSM jump threading opportunities, two of which will
>    get filtered out.  */
> -/* { dg-final { scan-tree-dump-times "Registering FSM" 1 "vrp1"} } */
> -/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch
> without threading a multiway branch" 2 "vrp1"} } */
> +/* { dg-final { scan-tree-dump-times "Registering FSM" 1 "thread1"} } */
> +/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch
> without threading a multiway branch" 2 "thread1"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c
> index 415ac2e..5842e28 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target sparc*-*-* x86_64-*-* } } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details" } */
>
> -/* { dg-final { scan-tree-dump "FSM did not thread around loop and would
> copy too many statements" "vrp1" } } */
> +/* { dg-final { scan-tree-dump "FSM did not thread around loop and would
> copy too many statements" "thread1" } } */
>
>
>  typedef __builtin_va_list __gnuc_va_list;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c
> index 89735f6..d546ac6 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c
> @@ -3,7 +3,7 @@
>
>  /* We're looking for a constant argument a PHI node.  There
>     should only be one if we unpropagate correctly.  */
> -/* { dg-final { scan-tree-dump-times ", 1" 1 "uncprop1"} } */
> +/* { dg-final { scan-tree-dump-times ", 1" 4 "uncprop1"} } */
>
>  typedef long unsigned int size_t;
>  typedef union gimple_statement_d *gimple;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
> index 0d40254..eb66136 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats" } */
>
>  void foo();
>  void bla();
> @@ -26,4 +26,4 @@ void thread_latch_through_header (void)
>     case.  And we want to thread through the header as well.  These
>     are both caught by threading in DOM.  */
>  /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
> -/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 1 "vrp1"} } */
> +/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c
> index 6d1ff5d..ede03e0 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */
> +/* { dg-options "-O2 -fdump-tree-dom2-details" } */
>
>  void foo();
>  void bla();
> @@ -22,5 +22,8 @@ void dont_thread_1 (void)
>      } while (i++ < 100);
>  }
>
> -/* { dg-final { scan-tree-dump "Jumps threaded: 2" "vrp1"} } */
> -/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2"} } */
> +/* This one can only be threaded if both paths to the
> +   conditional inside the loop are threaded at the same
> +   time.  Else we potentially end up with irreducible
> +   loops.  */
> +/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "dom2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c
> index 61705e1..043918d 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */
> +/* { dg-options "-O2 -fdump-tree-dom2-details" } */
>
>  void foo();
>  void bla();
> @@ -25,5 +25,4 @@ void dont_thread_2 (int first)
>
>  /* Peeling off the first iteration would make threading through
>     the loop latch safe, but we don't do that currently.  */
> -/* { dg-final { scan-tree-dump "Jumps threaded: 1" "vrp1"} } */
> -/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2"} } */
> +/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "dom2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> index ab6b6eb..5ec4687 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-dom2-details" } */
> -/* { dg-final { scan-tree-dump-times "FSM" 6 "dom2" } } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details
> -fdump-tree-thread2-details" } */
> +/* { dg-final { scan-tree-dump-times "FSM" 3 "thread1" } } */
> +/* { dg-final { scan-tree-dump-times "FSM" 4 "thread2" } } */
>
>  int sum0, sum1, sum2, sum3;
>  int foo (char *s, char **ret)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> index a7a737b..7cd1451 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> @@ -1,7 +1,8 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats
> -fdump-tree-dom3-stats -fdump-tree-vrp2-stats" } */
> -/* { dg-final { scan-tree-dump "Jumps threaded: 19"  "vrp1" } } */
> -/* { dg-final { scan-tree-dump "Jumps threaded: 12" "dom2" } } */
> +/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats
> -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats" }
> */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 16"  "thread1" } } */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 11" "thread2" } } */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
>  /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" } } */
>  /* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" } } */
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c
> index f250d22..fa6da7b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c
> @@ -1,6 +1,8 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-dom2-details" } */
> -/* { dg-final { scan-tree-dump "FSM" "dom2" } } */
> +/* { dg-options "-O2 -fdump-tree-thread2-details
> -fdump-tree-thread3-details -fdump-tree-thread4-details" } */
> +/* { dg-final { scan-tree-dump "FSM" "thread2" } } */
> +/* { dg-final { scan-tree-dump "FSM" "thread3" } } */
> +/* { dg-final { scan-tree-dump "FSM" "thread4" } } */
>
>  typedef struct bitmap_head_def *bitmap;
>  typedef const struct bitmap_head_def *const_bitmap;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
> index 99d45f5..f5f338b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
> -/* { dg-final { scan-tree-dump "FSM" "vrp1" } } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details" } */
> +/* { dg-final { scan-tree-dump "FSM" "thread1" } } */
>
>  typedef struct rtx_def *rtx;
>  typedef const struct rtx_def *const_rtx;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c
> index 481e888..3a64b01 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-stats" } */
>  typedef struct basic_block_def *basic_block;
>  struct basic_block_def;
>  struct edge_def;
> @@ -38,5 +38,5 @@ cleanup_empty_eh (basic_block bb)
>         foo ();
>      }
>  }
> -/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp1"} } */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 1" "thread1"} } */
>
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 170ee9a..362aa6e 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -156,6 +156,7 @@ DEFTIMEVAR (TV_TREE_SSA_OTHER            , "tree SSA
> other")
>  DEFTIMEVAR (TV_TREE_SSA_INCREMENTAL  , "tree SSA incremental")
>  DEFTIMEVAR (TV_TREE_OPS                     , "tree operand scan")
>  DEFTIMEVAR (TV_TREE_SSA_DOMINATOR_OPTS   , "dominator optimization")
> +DEFTIMEVAR (TV_TREE_SSA_THREAD_JUMPS , "backwards jump threading")
>  DEFTIMEVAR (TV_TREE_SRA              , "tree SRA")
>  DEFTIMEVAR (TV_ISOLATE_ERRONEOUS_PATHS    , "isolate eroneous paths")
>  DEFTIMEVAR (TV_TREE_CCP                     , "tree CCP")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 66e103a..36299a6 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -398,6 +398,7 @@ extern gimple_opt_pass *make_pass_dce (gcc::context
> *ctxt);
>  extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_thread_jumps (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
> index 636b67d..57f1f5c 100644
> --- a/gcc/tree-ssa-threadbackward.c
> +++ b/gcc/tree-ssa-threadbackward.c
> @@ -33,6 +33,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "cfganal.h"
>  #include "tree-pass.h"
> +#include "gimple-ssa.h"
> +#include "tree-phinodes.h"
>
>  static int max_threaded_paths;
>
> @@ -596,15 +598,13 @@ fsm_find_control_statement_thread_paths (tree name,
>     finding a path where NAME is a constant, we can thread the path.  */
>
>  void
> -find_jump_threads_backwards (edge e)
> +find_jump_threads_backwards (basic_block bb)
>  {
>    if (!flag_expensive_optimizations
> -      || optimize_function_for_size_p (cfun)
> -      || e->dest->loop_father != e->src->loop_father
> -      || loop_depth (e->dest->loop_father) == 0)
> +      || optimize_function_for_size_p (cfun))
>      return;
>
> -  gimple *stmt = get_gimple_control_stmt (e->dest);
> +  gimple *stmt = get_gimple_control_stmt (bb);
>    if (!stmt)
>      return;
>
> @@ -628,7 +628,7 @@ find_jump_threads_backwards (edge e)
>
>    vec<basic_block, va_gc> *bb_path;
>    vec_alloc (bb_path, 10);
> -  vec_safe_push (bb_path, e->dest);
> +  vec_safe_push (bb_path, bb);
>    hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
>
>    max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS);
> @@ -637,3 +637,60 @@ find_jump_threads_backwards (edge e)
>    delete visited_bbs;
>    vec_free (bb_path);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_thread_jumps =
> +{
> +  GIMPLE_PASS,
> +  "thread",
> +  OPTGROUP_NONE,
> +  TV_TREE_SSA_THREAD_JUMPS,
> +  ( PROP_cfg | PROP_ssa ),
> +  0,
> +  0,
> +  0,
> +  ( TODO_cleanup_cfg | TODO_update_ssa ),
> +};
> +
> +class pass_thread_jumps : public gimple_opt_pass
> +{
> +public:
> +  pass_thread_jumps (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_thread_jumps, ctxt)
> +  {}
> +
> +  opt_pass * clone (void) { return new pass_thread_jumps (m_ctxt); }
> +  virtual bool gate (function *);
> +  virtual unsigned int execute (function *);
> +};
> +
> +bool
> +pass_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
> +{
> +  return (flag_expensive_optimizations
> +         && ! optimize_function_for_size_p (cfun));
> +}
> +
> +
> +unsigned int
> +pass_thread_jumps::execute (function *fun)
> +{
> +  /* Try to thread each block with more than one successor.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      if (EDGE_COUNT (bb->succs) > 1)
> +       find_jump_threads_backwards (bb);
> +    }
> +  thread_through_all_blocks (true);
> +  return 0;
> +}
> +
> +}
> +
> +gimple_opt_pass *
> +make_pass_thread_jumps (gcc::context *ctxt)
> +{
> +  return new pass_thread_jumps (ctxt);
> +}
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index eca3812..5fd5b98 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -34,7 +34,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "params.h"
>  #include "tree-ssa-scopedtables.h"
>  #include "tree-ssa-threadedge.h"
> -#include "tree-ssa-threadbackward.h"
>  #include "tree-ssa-dom.h"
>  #include "gimple-fold.h"
>
> @@ -1183,8 +1182,6 @@ thread_across_edge (gcond *dummy_cond,
>        path->release ();
>        delete path;
>
> -      find_jump_threads_backwards (e);
> -
>        /* A negative status indicates the target block was deemed too big to
>          duplicate.  Just quit now rather than trying to use the block as
>          a joiner in a jump threading path.
> @@ -1231,10 +1228,7 @@ thread_across_edge (gcond *dummy_cond,
>        {
>         if ((e->flags & EDGE_DFS_BACK) != 0
>             || (taken_edge->flags & EDGE_DFS_BACK) != 0)
> -         {
> -           find_jump_threads_backwards (taken_edge);
> -           continue;
> -         }
> +         continue;
>
>         /* Push a fresh marker so we can unwind the equivalences created
>            for each of E->dest's successors.  */
> @@ -1282,10 +1276,7 @@ thread_across_edge (gcond *dummy_cond,
>             register_jump_thread (path);
>           }
>         else
> -         {
> -           find_jump_threads_backwards (path->last ()->e);
> -           delete_jump_thread_path (path);
> -         }
> +         delete_jump_thread_path (path);
>
>         /* And unwind the equivalence table.  */
>         if (avail_exprs_stack)
>
Jeff Law May 31, 2016, 2:55 p.m. UTC | #2
On 05/30/2016 03:16 AM, Richard Biener wrote:
>
> Ok, but the placement (and number of) threading passes then no longer depends
> on DOM/VRP passes - and as you placed the threading passes _before_ those
> passes the threading itself does not benefit from DOM/VRP but only from
> previous optimization passes.
Right.  Note that number of passes now is actually the same as we had 
before, they're just occurring outside DOM/VRP.

The backwards threader's only dependency on DOM/VRP was to propagate 
constants into PHI nodes and to propagate away copies.  That dependency 
was removed.


>
> I see this as opportunity to remove some of them ;)  I now see in the main
> optimization pipeline
>
>       NEXT_PASS (pass_fre);
>       NEXT_PASS (pass_thread_jumps);
>       NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
>
> position makes sense - FRE removed redundancies and fully copy/constant
> propagated the IL.
>
>       NEXT_PASS (pass_sra);
>       /* The dom pass will also resolve all __builtin_constant_p calls
>          that are still there to 0.  This has to be done after some
>          propagations have already run, but before some more dead code
>          is removed, and this place fits nicely.  Remember this when
>          trying to move or duplicate pass_dominator somewhere earlier.  */
>       NEXT_PASS (pass_thread_jumps);
>       NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
>
> this position OTOH doesn't make much sense as IL cleanup is missing
> after SRA and previous opts.  After loop we now have
We should look at this one closely.  The backwards threader doesn't 
depend on IL cleanups.  It should do its job regardless of the state of 
the IL.


>
>       NEXT_PASS (pass_tracer);
>       NEXT_PASS (pass_thread_jumps);
>       NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
>       NEXT_PASS (pass_strlen);
>       NEXT_PASS (pass_thread_jumps);
>       NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
>
> I don't think we want two threadings so close together.  It makes some sense
> to have a threading _after_ DOM but before VRP (DOM cleaned up the IL).
That one is, IMHO, the least useful.  I haven't done any significant 
analysis of this specific instance though to be sure.  The step you saw 
was meant to largely preserve behavior.  Further cleanups are definitely 
possible.

The most common case I've seen where the DOM/VRP make transformations 
that then expose something useful to the backward threader come from 
those pesky context sensitive equivalences.

We (primarily Andrew, but Aldy and myself are also involved) are looking 
at ways to more generally expose range information created for these 
situations.  Exposing range information and getting it more precise by 
allowing "unnecessary copies" or some such would eliminate those cases 
where DOM/VRP expose new opportunities for the backwards jump threader.




>
> So that would leave two from your four passes and expose the opportunity
> to re-add one during early-opts, also after FRE.  That one should be
> throttled down to operate in "-Os" mode though.
I'll take a look at them, but with some personal stuff and PTO it'll 
likely be a few weeks before I've got anything useful.

>
> So, can you see what removing the two threading passes that don't make
> sense to me do to your statistics?  And check whether a -Os-like threading
> can be done early?
Interesting you should mention doing threading early -- that was one of 
the secondary motivations behind getting the backwards threading bits 
out into their own pass, I just failed to mention it.

Essentially we want to limit the backwards substitution to single step 
within a single block for that case (which is trivially easy).  That 
would allow us to run a very cheap threader during early optimizations.

Jeff
Richard Biener May 31, 2016, 5:38 p.m. UTC | #3
On May 31, 2016 4:55:36 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 05/30/2016 03:16 AM, Richard Biener wrote:
>>
>> Ok, but the placement (and number of) threading passes then no longer
>depends
>> on DOM/VRP passes - and as you placed the threading passes _before_
>those
>> passes the threading itself does not benefit from DOM/VRP but only
>from
>> previous optimization passes.
>Right.  Note that number of passes now is actually the same as we had 
>before, they're just occurring outside DOM/VRP.
>
>The backwards threader's only dependency on DOM/VRP was to propagate 
>constants into PHI nodes and to propagate away copies.  That dependency
>
>was removed.
>
>
>>
>> I see this as opportunity to remove some of them ;)  I now see in the
>main
>> optimization pipeline
>>
>>       NEXT_PASS (pass_fre);
>>       NEXT_PASS (pass_thread_jumps);
>>       NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
>>
>> position makes sense - FRE removed redundancies and fully
>copy/constant
>> propagated the IL.
>>
>>       NEXT_PASS (pass_sra);
>>       /* The dom pass will also resolve all __builtin_constant_p
>calls
>>          that are still there to 0.  This has to be done after some
>>          propagations have already run, but before some more dead
>code
>>          is removed, and this place fits nicely.  Remember this when
>>          trying to move or duplicate pass_dominator somewhere
>earlier.  */
>>       NEXT_PASS (pass_thread_jumps);
>>       NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
>>
>> this position OTOH doesn't make much sense as IL cleanup is missing
>> after SRA and previous opts.  After loop we now have
>We should look at this one closely.  The backwards threader doesn't 
>depend on IL cleanups.  It should do its job regardless of the state of
>
>the IL.
>
>
>>
>>       NEXT_PASS (pass_tracer);
>>       NEXT_PASS (pass_thread_jumps);
>>       NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p
>*/);
>>       NEXT_PASS (pass_strlen);
>>       NEXT_PASS (pass_thread_jumps);
>>       NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
>>
>> I don't think we want two threadings so close together.  It makes
>some sense
>> to have a threading _after_ DOM but before VRP (DOM cleaned up the
>IL).
>That one is, IMHO, the least useful.  I haven't done any significant 
>analysis of this specific instance though to be sure.  The step you saw
>
>was meant to largely preserve behavior.  Further cleanups are
>definitely 
>possible.
>
>The most common case I've seen where the DOM/VRP make transformations 
>that then expose something useful to the backward threader come from 
>those pesky context sensitive equivalences.
>
>We (primarily Andrew, but Aldy and myself are also involved) are
>looking 
>at ways to more generally expose range information created for these 
>situations.  Exposing range information and getting it more precise by 
>allowing "unnecessary copies" or some such would eliminate those cases 
>where DOM/VRP expose new opportunities for the backwards jump threader.
>
>
>
>
>>
>> So that would leave two from your four passes and expose the
>opportunity
>> to re-add one during early-opts, also after FRE.  That one should be
>> throttled down to operate in "-Os" mode though.
>I'll take a look at them, but with some personal stuff and PTO it'll 
>likely be a few weeks before I've got anything useful.
>
>>
>> So, can you see what removing the two threading passes that don't
>make
>> sense to me do to your statistics?  And check whether a -Os-like
>threading
>> can be done early?
>Interesting you should mention doing threading early -- that was one of
>
>the secondary motivations behind getting the backwards threading bits 
>out into their own pass, I just failed to mention it.
>
>Essentially we want to limit the backwards substitution to single step 
>within a single block for that case (which is trivially easy).  That 
>would allow us to run a very cheap threader during early optimizations.

Just do double check - the pass does both forward and backward threading and DOM/VRP now do neither themselves?

Richard.

>Jeff
Jeff Law May 31, 2016, 9:12 p.m. UTC | #4
On 05/31/2016 11:38 AM, Richard Biener wrote:

>> Essentially we want to limit the backwards substitution to single step
>> within a single block for that case (which is trivially easy).  That
>> would allow us to run a very cheap threader during early optimizations.
>
> Just do double check - the pass does both forward and backward threading and DOM/VRP now do neither themselves?
Will do.  It's pretty easy.  I suspect there's a lot of low hanging fruit.

The backward pass is strictly backward substitution & simplification and 
independent of DOM/VRP.  I believe we're ultimately going to want a 
param or something to determine how far back it goes in the IL.

The old threader (still called from VRP/DOM) is primarily forward based 
using equivalence tables and ASSERT_EXPRs to simplify expressions as it 
walks statements in the block.

I'm pretty sure everything done by the old threader can be more easily 
and efficiently modeled in the backwards threader.  One a couple 
infrastructure items are addressed, I want to start looking at cutting 
back on the amount of work done in the old threader with the goal of 
starting to eliminate calls into it completely.

Jeff
Richard Biener June 1, 2016, 9:46 a.m. UTC | #5
On Tue, May 31, 2016 at 11:12 PM, Jeff Law <law@redhat.com> wrote:
> On 05/31/2016 11:38 AM, Richard Biener wrote:
>
>>> Essentially we want to limit the backwards substitution to single step
>>> within a single block for that case (which is trivially easy).  That
>>> would allow us to run a very cheap threader during early optimizations.
>>
>>
>> Just do double check - the pass does both forward and backward threading
>> and DOM/VRP now do neither themselves?
>
> Will do.  It's pretty easy.  I suspect there's a lot of low hanging fruit.
>
> The backward pass is strictly backward substitution & simplification and
> independent of DOM/VRP.  I believe we're ultimately going to want a param or
> something to determine how far back it goes in the IL.
>
> The old threader (still called from VRP/DOM) is primarily forward based
> using equivalence tables and ASSERT_EXPRs to simplify expressions as it
> walks statements in the block.
>
> I'm pretty sure everything done by the old threader can be more easily and
> efficiently modeled in the backwards threader.  One a couple infrastructure
> items are addressed, I want to start looking at cutting back on the amount
> of work done in the old threader with the goal of starting to eliminate
> calls into it completely.

Maybe to rephrase my question - DOM/VRP now do no threading anymore?
And the "backwards" threader is named "backwards" due to its implementation
and not because it only threads across backedges?

Richard.

> Jeff
Martin Liška June 9, 2016, 11:18 a.m. UTC | #6
Hello.

The patch caused:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71466

Thanks,
Martin
Jeff Law June 9, 2016, 5:50 p.m. UTC | #7
On 06/09/2016 05:18 AM, Martin Liška wrote:
> Hello.
>
> The patch caused:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71466
THanks.  I've got a few issues to address related to that change -- I've 
just been swamped with some personal stuff the last week.

I'm seriously considering reverting the change at this point, addressing 
the issues, then re-installing as I expect to stay pretty busy in the 
immediate future.

jeff
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f04d26d..40bac96 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@ 
+2016-05-26  Jeff Law  <law@redhat.com>
+
+	* tree-ssa-threadedge.c: Remove include of tree-ssa-threadbackward.h.
+	(thread_across_edge): Remove calls to find_jump_threads_backwards.
+	* passes.def: Add jump threading passes before DOM/VRP.
+	* tree-ssa-threadbackward.c (find_jump_threads_backwards): Change
+	argument to a basic block from an edge.  Remove tests which are
+	handled elsewhere.
+	(pass_data_thread_jumps, class pass_thread_jumps): New.
+	(pass_thread_jumps::gate, pass_thread_jumps::execute): New.
+	(make_pass_thread_jumps): Likewise.
+	* tree-pass.h (make_pass_thread_jumps): Declare.
+
 2016-05-27  Eric Botcazou  <ebotcazou@adacore.com>
 
 	* config/visium/visium-protos.h (split_double_move): Rename into...
diff --git a/gcc/passes.def b/gcc/passes.def
index 993ed28..3647e90 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -199,6 +199,7 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_return_slot);
       NEXT_PASS (pass_fre);
       NEXT_PASS (pass_merge_phi);
+      NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
       NEXT_PASS (pass_chkp_opt);
       NEXT_PASS (pass_dce);
@@ -218,6 +219,7 @@  along with GCC; see the file COPYING3.  If not see
 	 propagations have already run, but before some more dead code
 	 is removed, and this place fits nicely.  Remember this when
 	 trying to move or duplicate pass_dominator somewhere earlier.  */
+      NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
       /* At this point the majority of const/copy propagations
 	 are exposed.  Go ahead and identify paths that should never
@@ -305,8 +307,10 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_split_paths);
       NEXT_PASS (pass_tracer);
+      NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
       NEXT_PASS (pass_strlen);
+      NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
       /* The only const/copy propagation opportunities left after
 	 DOM and VRP should be due to degenerate PHI nodes.  So rather than
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 00c2a99..5ded2d4 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,19 @@ 
+2016-05-26  Jeff Law  <law@redhat.com>
+
+	* gcc.dg/tree-ssa/pr21417.c: Update expected output.
+	* gcc.dg/tree-ssa/pr66752-3.c: Likewise.
+	* gcc.dg/tree-ssa/pr68198.c: Likewise.
+	* gcc.dg/tree-ssa/pr69196-1.c: Likewise.
+	* gcc.dg/tree-ssa/pr69270-3.c: Likewise.
+	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
+	* gcc.dg/tree-ssa/ssa-dom-thread-2g.c: Likewise.
+	* gcc.dg/tree-ssa/ssa-dom-thread-2h.c: Likewise.
+	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Likewise.
+	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Likewise.
+	* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Likewise.
+	* gcc.dg/tree-ssa/ssa-dom-thread-13.c: Likewise.
+	* gcc.dg/tree-ssa/vrp56.c: Likewise.
+
 2016-05-27  Ville Voutilainen  <ville.voutilainen@gmail.com>
 
 	PR c++/69855
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c
index c865ee3..4845119 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21417.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom3-details" } */
+/* { dg-options "-O2 -fdump-tree-thread4-details" } */
 
 struct tree_common 
 { 
@@ -49,5 +49,5 @@  L23:
 /* We should thread the backedge to the top of the loop; ie we only
    execute the if (expr->common.code != 142) test once per loop
    iteration.  */
-/* { dg-final { scan-tree-dump-times "FSM jump thread" 1 "dom3" } } */
+/* { dg-final { scan-tree-dump-times "FSM jump thread" 1 "thread4" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index 2949cbb..896c8bf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
 
 extern int status, pt;
 extern int count;
@@ -32,9 +32,10 @@  foo (int N, int c, int b, int *a)
    pt--;
 }
 
-/* There are 3 FSM jump threading opportunities, all of which will be
+/* There are 4 FSM jump threading opportunities, all of which will be
    realized, which will eliminate testing of FLAG, completely.  */
-/* { dg-final { scan-tree-dump-times "Registering FSM" 3 "vrp1"} } */
+/* { dg-final { scan-tree-dump-times "Registering FSM" 4 "thread1"} } */
 
-/* There should be no assignments or references to FLAG.  */
-/* { dg-final { scan-tree-dump-not "flag" "optimized"} } */
+/* There should be no assignments or references to FLAG, verify they're
+   eliminated as early as possible.  */
+/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
index 227ffeb..d678dc8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details" } */
 
 extern void abort (void);
 
@@ -39,5 +39,5 @@  c_finish_omp_clauses (tree clauses)
 
 /* There are 3 FSM jump threading opportunities, two of which will
   get filtered out.  */
-/* { dg-final { scan-tree-dump-times "Registering FSM" 1 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch without threading a multiway branch" 2 "vrp1"} } */
+/* { dg-final { scan-tree-dump-times "Registering FSM" 1 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch without threading a multiway branch" 2 "thread1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c
index 415ac2e..5842e28 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69196-1.c
@@ -1,7 +1,7 @@ 
 /* { dg-do compile { target sparc*-*-* x86_64-*-* } } */
-/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details" } */
 
-/* { dg-final { scan-tree-dump "FSM did not thread around loop and would copy too many statements" "vrp1" } } */
+/* { dg-final { scan-tree-dump "FSM did not thread around loop and would copy too many statements" "thread1" } } */
 
 
 typedef __builtin_va_list __gnuc_va_list;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c
index 89735f6..d546ac6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c
@@ -3,7 +3,7 @@ 
 
 /* We're looking for a constant argument a PHI node.  There
    should only be one if we unpropagate correctly.  */
-/* { dg-final { scan-tree-dump-times ", 1" 1 "uncprop1"} } */
+/* { dg-final { scan-tree-dump-times ", 1" 4 "uncprop1"} } */
 
 typedef long unsigned int size_t;
 typedef union gimple_statement_d *gimple;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
index 0d40254..eb66136 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */
+/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats" } */
 
 void foo();
 void bla();
@@ -26,4 +26,4 @@  void thread_latch_through_header (void)
    case.  And we want to thread through the header as well.  These
    are both caught by threading in DOM.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 1 "vrp1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c
index 6d1ff5d..ede03e0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2g.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */
+/* { dg-options "-O2 -fdump-tree-dom2-details" } */
 
 void foo();
 void bla();
@@ -22,5 +22,8 @@  void dont_thread_1 (void)
     } while (i++ < 100);
 }
 
-/* { dg-final { scan-tree-dump "Jumps threaded: 2" "vrp1"} } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2"} } */
+/* This one can only be threaded if both paths to the
+   conditional inside the loop are threaded at the same
+   time.  Else we potentially end up with irreducible
+   loops.  */
+/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "dom2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c
index 61705e1..043918d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2h.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */
+/* { dg-options "-O2 -fdump-tree-dom2-details" } */
 
 void foo();
 void bla();
@@ -25,5 +25,4 @@  void dont_thread_2 (int first)
 
 /* Peeling off the first iteration would make threading through
    the loop latch safe, but we don't do that currently.  */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1" "vrp1"} } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2"} } */
+/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "dom2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index ab6b6eb..5ec4687 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,6 +1,7 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details" } */
-/* { dg-final { scan-tree-dump-times "FSM" 6 "dom2" } } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */
+/* { dg-final { scan-tree-dump-times "FSM" 3 "thread1" } } */
+/* { dg-final { scan-tree-dump-times "FSM" 4 "thread2" } } */
 
 int sum0, sum1, sum2, sum3;
 int foo (char *s, char **ret)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index a7a737b..7cd1451 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,7 +1,8 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats" } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 19"  "vrp1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 12" "dom2" } } */
+/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats" } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 16"  "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 11" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c
index f250d22..fa6da7b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-12.c
@@ -1,6 +1,8 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details" } */
-/* { dg-final { scan-tree-dump "FSM" "dom2" } } */
+/* { dg-options "-O2 -fdump-tree-thread2-details -fdump-tree-thread3-details -fdump-tree-thread4-details" } */
+/* { dg-final { scan-tree-dump "FSM" "thread2" } } */
+/* { dg-final { scan-tree-dump "FSM" "thread3" } } */
+/* { dg-final { scan-tree-dump "FSM" "thread4" } } */
 
 typedef struct bitmap_head_def *bitmap;
 typedef const struct bitmap_head_def *const_bitmap;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
index 99d45f5..f5f338b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
-/* { dg-final { scan-tree-dump "FSM" "vrp1" } } */
+/* { dg-options "-O2 -fdump-tree-thread1-details" } */
+/* { dg-final { scan-tree-dump "FSM" "thread1" } } */
 
 typedef struct rtx_def *rtx;
 typedef const struct rtx_def *const_rtx;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c
index 481e888..3a64b01 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp56.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-stats" } */
 typedef struct basic_block_def *basic_block;
 struct basic_block_def;
 struct edge_def;
@@ -38,5 +38,5 @@  cleanup_empty_eh (basic_block bb)
 	foo ();
     }
 }
-/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp1"} } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 1" "thread1"} } */
 
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 170ee9a..362aa6e 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -156,6 +156,7 @@  DEFTIMEVAR (TV_TREE_SSA_OTHER	     , "tree SSA other")
 DEFTIMEVAR (TV_TREE_SSA_INCREMENTAL  , "tree SSA incremental")
 DEFTIMEVAR (TV_TREE_OPS	             , "tree operand scan")
 DEFTIMEVAR (TV_TREE_SSA_DOMINATOR_OPTS   , "dominator optimization")
+DEFTIMEVAR (TV_TREE_SSA_THREAD_JUMPS , "backwards jump threading")
 DEFTIMEVAR (TV_TREE_SRA              , "tree SRA")
 DEFTIMEVAR (TV_ISOLATE_ERRONEOUS_PATHS    , "isolate eroneous paths")
 DEFTIMEVAR (TV_TREE_CCP		     , "tree CCP")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 66e103a..36299a6 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -398,6 +398,7 @@  extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_thread_jumps (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 636b67d..57f1f5c 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -33,6 +33,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "cfganal.h"
 #include "tree-pass.h"
+#include "gimple-ssa.h"
+#include "tree-phinodes.h"
 
 static int max_threaded_paths;
 
@@ -596,15 +598,13 @@  fsm_find_control_statement_thread_paths (tree name,
    finding a path where NAME is a constant, we can thread the path.  */
 
 void  
-find_jump_threads_backwards (edge e)
+find_jump_threads_backwards (basic_block bb)
 {     
   if (!flag_expensive_optimizations
-      || optimize_function_for_size_p (cfun)
-      || e->dest->loop_father != e->src->loop_father
-      || loop_depth (e->dest->loop_father) == 0)
+      || optimize_function_for_size_p (cfun))
     return;
 
-  gimple *stmt = get_gimple_control_stmt (e->dest);
+  gimple *stmt = get_gimple_control_stmt (bb);
   if (!stmt)
     return;
 
@@ -628,7 +628,7 @@  find_jump_threads_backwards (edge e)
 
   vec<basic_block, va_gc> *bb_path;
   vec_alloc (bb_path, 10);
-  vec_safe_push (bb_path, e->dest);
+  vec_safe_push (bb_path, bb);
   hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
 
   max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS);
@@ -637,3 +637,60 @@  find_jump_threads_backwards (edge e)
   delete visited_bbs;
   vec_free (bb_path);
 }
+
+namespace {
+
+const pass_data pass_data_thread_jumps =
+{
+  GIMPLE_PASS,
+  "thread",
+  OPTGROUP_NONE,
+  TV_TREE_SSA_THREAD_JUMPS,
+  ( PROP_cfg | PROP_ssa ),
+  0,
+  0,
+  0,
+  ( TODO_cleanup_cfg | TODO_update_ssa ),
+};
+
+class pass_thread_jumps : public gimple_opt_pass
+{
+public:
+  pass_thread_jumps (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_thread_jumps, ctxt)
+  {}
+
+  opt_pass * clone (void) { return new pass_thread_jumps (m_ctxt); }
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *);
+};
+
+bool
+pass_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
+{
+  return (flag_expensive_optimizations
+	  && ! optimize_function_for_size_p (cfun));
+}
+
+
+unsigned int
+pass_thread_jumps::execute (function *fun)
+{
+  /* Try to thread each block with more than one successor.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      if (EDGE_COUNT (bb->succs) > 1)
+	find_jump_threads_backwards (bb);
+    }
+  thread_through_all_blocks (true);
+  return 0;
+}
+
+}
+
+gimple_opt_pass *
+make_pass_thread_jumps (gcc::context *ctxt)
+{
+  return new pass_thread_jumps (ctxt);
+}
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index eca3812..5fd5b98 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -34,7 +34,6 @@  along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "tree-ssa-scopedtables.h"
 #include "tree-ssa-threadedge.h"
-#include "tree-ssa-threadbackward.h"
 #include "tree-ssa-dom.h"
 #include "gimple-fold.h"
 
@@ -1183,8 +1182,6 @@  thread_across_edge (gcond *dummy_cond,
       path->release ();
       delete path;
 
-      find_jump_threads_backwards (e);
-
       /* A negative status indicates the target block was deemed too big to
 	 duplicate.  Just quit now rather than trying to use the block as
 	 a joiner in a jump threading path.
@@ -1231,10 +1228,7 @@  thread_across_edge (gcond *dummy_cond,
       {
 	if ((e->flags & EDGE_DFS_BACK) != 0
 	    || (taken_edge->flags & EDGE_DFS_BACK) != 0)
-	  {
-	    find_jump_threads_backwards (taken_edge);
-	    continue;
-	  }
+	  continue;
 
 	/* Push a fresh marker so we can unwind the equivalences created
 	   for each of E->dest's successors.  */
@@ -1282,10 +1276,7 @@  thread_across_edge (gcond *dummy_cond,
 	    register_jump_thread (path);
 	  }
 	else
-	  {
-	    find_jump_threads_backwards (path->last ()->e);
-	    delete_jump_thread_path (path);
-	  }
+	  delete_jump_thread_path (path);
 
 	/* And unwind the equivalence table.  */
 	if (avail_exprs_stack)