diff mbox series

Refactor jump_thread_path_registry.

Message ID 20210911083558.1692018-1-aldyh@redhat.com
State New
Headers show
Series Refactor jump_thread_path_registry. | expand

Commit Message

Aldy Hernandez Sept. 11, 2021, 8:35 a.m. UTC
In an attempt to refactor thread_through_all_blocks(), I've realized
that there is a mess of code dealing with coexisting forward and
backward thread types.  However, this is an impossible scenario, as
the registry contains either forward/old-style threads, or backward
threads (EDGE_FSM_THREADs), never both.

The fact that both types of threads cannot coexist, simplifies the
code considerably.  For that matter, it splits things up nicely
because there are some common bits that can go into a base class, and
some differing code that can go into derived classes.

Diving things in this way makes it very obvious which parts belong in
the old-style copier and which parts belong to the generic copier.
Doing all this provided some nice cleanups, as well as fixing a latent
bug in adjust_paths_after_duplication.

The diff is somewhat hard to read, so perhaps looking at the final
output would be easier.

A general overview of what this patch achieves can be seen by just
looking at this simplified class layout:

// Abstract class for the jump thread registry.

class jt_path_registry
{
public:
  jt_path_registry ();
  virtual ~jt_path_registry ();
  bool register_jump_thread (vec<jump_thread_edge *> *);
  bool thread_through_all_blocks (bool peel_loop_headers);
  jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
  vec<jump_thread_edge *> *allocate_thread_path ();
protected:
  vec<vec<jump_thread_edge *> *> m_paths;
  unsigned long m_num_threaded_edges;
private:
  virtual bool update_cfg (bool peel_loop_headers) = 0;
};

// Forward threader path registry using a custom BB copier.

class fwd_jt_path_registry : public jt_path_registry
{
public:
  fwd_jt_path_registry ();
  ~fwd_jt_path_registry ();
  void remove_jump_threads_including (edge);
private:
  bool update_cfg (bool peel_loop_headers) override;
  void mark_threaded_blocks (bitmap threaded_blocks);
  bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
  bool thread_block (basic_block, bool noloop_only);
  bool thread_through_loop_header (class loop *loop,
                                   bool may_peel_loop_headers);
  class redirection_data *lookup_redirection_data (edge e, enum insert_option);
  hash_table<struct removed_edges> *m_removed_edges;
  hash_table<redirection_data> *m_redirection_data;
};

// Backward threader path registry using a generic BB copier.

class back_jt_path_registry : public jt_path_registry
{
private:
  bool update_cfg (bool peel_loop_headers) override;
  void adjust_paths_after_duplication (unsigned curr_path_num);
  bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
                              unsigned n_region, unsigned current_path_no);
  bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
};

That is, the forward and backward bits have been completely split,
while deriving from a base class for the common functionality.

Most everything is mechanical, but there are a few gotchas:

a) back_jt_path_registry::update_cfg(), which contains the backward
threading specific bits, is rather simple, since most of the code in
the original thread_through_all_blocks() only applied to the forward
threader: removed edges, mark_threaded_blocks,
thread_through_loop_header, the copy tables (*).

(*) The back threader has its own copy tables in
duplicate_thread_path.

b) In some cases, adjust_paths_after_duplication() was commoning out
so many blocks that it was removing the initial EDGE_FSM_THREAD
marker.  I've fixed this.

c) AFAICT, when run from the forward threader,
thread_through_all_blocks() attempts to remove threads starting with
an edge already seen, but it would never see anything because the loop
doing the checking only has a visited_starting_edges.contains(), and
no corresponding visited_starting_edges.add().  The add() method in
thread_through_all_blocks belongs to the backward threading bits, and
as I've explained, both types cannot coexist.  I've removed the checks
in the forward bits since they don't appear to do anything.  If this
was an oversight, and we want to avoid threading already seen edges in
the forward threader, I can move this functionality to the base class.

Ultimately I would like to move all the registry code to
tree-ssa-threadregistry.*.  I've avoided this in this patch to aid in
review.

My apologies for this longass explanation, but I want to make sure
we're covering all of our bases.

Tested on x86-64 Linux by a very tedious process of moving chunks
around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and
repeating ad-nauseum.  And of course, by running a full bootstrap and
tests.

OK?

p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD
type.

gcc/ChangeLog:

	* tree-ssa-threadbackward.c (class back_threader_registry): Use
	back_jt_path_registry.
	* tree-ssa-threadedge.c (jump_threader::jump_threader): Use
	fwd_jt_path_registry.
	* tree-ssa-threadedge.h (class jump_threader): Same..
	* tree-ssa-threadupdate.c
	(jump_thread_path_registry::jump_thread_path_registry): Rename...
	(jt_path_registry::jt_path_registry): ...to this.
	(jump_thread_path_registry::~jump_thread_path_registry): Rename...
	(jt_path_registry::~jt_path_registry): ...this.
	(fwd_jt_path_registry::fwd_jt_path_registry): New.
	(fwd_jt_path_registry::~fwd_jt_path_registry): New.
	(jump_thread_path_registry::allocate_thread_edge): Rename...
	(jt_path_registry::allocate_thread_edge): ...to this.
	(jump_thread_path_registry::allocate_thread_path): Rename...
	(jt_path_registry::allocate_thread_path): ...to this.
	(jump_thread_path_registry::lookup_redirection_data): Rename...
	(fwd_jt_path_registry::lookup_redirection_data): ...to this.
	(jump_thread_path_registry::thread_block_1): Rename...
	(fwd_jt_path_registry::thread_block_1): ...to this.
	(jump_thread_path_registry::thread_block): Rename...
	(fwd_jt_path_registry::thread_block): ...to this.
	(jt_path_registry::thread_through_loop_header): Rename...
	(fwd_jt_path_registry::thread_through_loop_header): ...to this.
	(jump_thread_path_registry::mark_threaded_blocks): Rename...
	(fwd_jt_path_registry::mark_threaded_blocks): ...to this.
	(jump_thread_path_registry::debug_path): Rename...
	(jt_path_registry::debug_path): ...to this.
	(jump_thread_path_registry::dump): Rename...
	(jt_path_registry::debug): ...to this.
	(jump_thread_path_registry::rewire_first_differing_edge): Rename...
	(back_jt_path_registry::rewire_first_differing_edge): ...to this.
	(jump_thread_path_registry::adjust_paths_after_duplication): Rename...
	(back_jt_path_registry::adjust_paths_after_duplication): ...to this.
	(jump_thread_path_registry::duplicate_thread_path): Rename...
	(back_jt_path_registry::duplicate_thread_path): ...to this.  Also,
	drop ill-formed candidates.
	(jump_thread_path_registry::remove_jump_threads_including): Rename...
	(fwd_jt_path_registry::remove_jump_threads_including): ...to this.
	(jt_path_registry::thread_through_all_blocks): New.
	(back_jt_path_registry::update_cfg): New.
	(fwd_jt_path_registry::update_cfg): New.
	(jump_thread_path_registry::register_jump_thread): Rename...
	(jt_path_registry::register_jump_thread): ...to this.
	* tree-ssa-threadupdate.h (class jump_thread_path_registry):
	Abstract to...
	(class jt_path_registry): ...here.
	(class fwd_jt_path_registry): New.
	(class back_jt_path_registry): New.
---
 gcc/tree-ssa-threadbackward.c |   2 +-
 gcc/tree-ssa-threadedge.c     |   2 +-
 gcc/tree-ssa-threadedge.h     |   2 +-
 gcc/tree-ssa-threadupdate.c   | 213 +++++++++++++++++-----------------
 gcc/tree-ssa-threadupdate.h   |  60 ++++++----
 5 files changed, 149 insertions(+), 130 deletions(-)

Comments

Jeff Law Sept. 11, 2021, 3:31 p.m. UTC | #1
On 9/11/2021 2:35 AM, Aldy Hernandez wrote:
> In an attempt to refactor thread_through_all_blocks(), I've realized
> that there is a mess of code dealing with coexisting forward and
> backward thread types.  However, this is an impossible scenario, as
> the registry contains either forward/old-style threads, or backward
> threads (EDGE_FSM_THREADs), never both.
Correct.  There was a time when they both could show up, but that was 
fixed years ago.

>
> The fact that both types of threads cannot coexist, simplifies the
> code considerably.  For that matter, it splits things up nicely
> because there are some common bits that can go into a base class, and
> some differing code that can go into derived classes.
Excellent.

>
> Diving things in this way makes it very obvious which parts belong in
> the old-style copier and which parts belong to the generic copier.
> Doing all this provided some nice cleanups, as well as fixing a latent
> bug in adjust_paths_after_duplication.
Yea, I expected this could clean up the logic a bit and make everything 
easier to follow.  Thanks for tackling it, it's been on my TODO for a 
long time.

>
> The diff is somewhat hard to read, so perhaps looking at the final
> output would be easier.
>
> A general overview of what this patch achieves can be seen by just
> looking at this simplified class layout:
>
> // Abstract class for the jump thread registry.
>
> class jt_path_registry
> {
> public:
>    jt_path_registry ();
>    virtual ~jt_path_registry ();
>    bool register_jump_thread (vec<jump_thread_edge *> *);
>    bool thread_through_all_blocks (bool peel_loop_headers);
>    jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
>    vec<jump_thread_edge *> *allocate_thread_path ();
> protected:
>    vec<vec<jump_thread_edge *> *> m_paths;
>    unsigned long m_num_threaded_edges;
> private:
>    virtual bool update_cfg (bool peel_loop_headers) = 0;
> };
>
> // Forward threader path registry using a custom BB copier.
>
> class fwd_jt_path_registry : public jt_path_registry
> {
> public:
>    fwd_jt_path_registry ();
>    ~fwd_jt_path_registry ();
>    void remove_jump_threads_including (edge);
> private:
>    bool update_cfg (bool peel_loop_headers) override;
>    void mark_threaded_blocks (bitmap threaded_blocks);
>    bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
>    bool thread_block (basic_block, bool noloop_only);
>    bool thread_through_loop_header (class loop *loop,
>                                     bool may_peel_loop_headers);
>    class redirection_data *lookup_redirection_data (edge e, enum insert_option);
>    hash_table<struct removed_edges> *m_removed_edges;
>    hash_table<redirection_data> *m_redirection_data;
> };
>
> // Backward threader path registry using a generic BB copier.
>
> class back_jt_path_registry : public jt_path_registry
> {
> private:
>    bool update_cfg (bool peel_loop_headers) override;
>    void adjust_paths_after_duplication (unsigned curr_path_num);
>    bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
>                                unsigned n_region, unsigned current_path_no);
>    bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
> };
>
> That is, the forward and backward bits have been completely split,
> while deriving from a base class for the common functionality.
>
> Most everything is mechanical, but there are a few gotchas:
>
> a) back_jt_path_registry::update_cfg(), which contains the backward
> threading specific bits, is rather simple, since most of the code in
> the original thread_through_all_blocks() only applied to the forward
> threader: removed edges, mark_threaded_blocks,
> thread_through_loop_header, the copy tables (*).
>
> (*) The back threader has its own copy tables in
> duplicate_thread_path.
>
> b) In some cases, adjust_paths_after_duplication() was commoning out
> so many blocks that it was removing the initial EDGE_FSM_THREAD
> marker.  I've fixed this.
>
> c) AFAICT, when run from the forward threader,
> thread_through_all_blocks() attempts to remove threads starting with
> an edge already seen, but it would never see anything because the loop
> doing the checking only has a visited_starting_edges.contains(), and
> no corresponding visited_starting_edges.add().  The add() method in
> thread_through_all_blocks belongs to the backward threading bits, and
> as I've explained, both types cannot coexist.  I've removed the checks
> in the forward bits since they don't appear to do anything.  If this
> was an oversight, and we want to avoid threading already seen edges in
> the forward threader, I can move this functionality to the base class.
>
> Ultimately I would like to move all the registry code to
> tree-ssa-threadregistry.*.  I've avoided this in this patch to aid in
> review.
>
> My apologies for this longass explanation, but I want to make sure
> we're covering all of our bases.
>
> Tested on x86-64 Linux by a very tedious process of moving chunks
> around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and
> repeating ad-nauseum.  And of course, by running a full bootstrap and
> tests.
>
> OK?
>
> p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD
> type.
So another thing to consider is that the threaders initially record 
their paths in different directions.  Forward threading records starting 
at the first block, backward from the final block.  At some point (I no 
longer remember where) we invert the backwards threader's path to fit 
the model that the registry wanted and I think we then convert the path 
into the structure that the generic copier wants at some later point.

In theory with the two better separated we might not need to do the 
conversions of the backwards threader's paths anymore.
>
> gcc/ChangeLog:
>
> 	* tree-ssa-threadbackward.c (class back_threader_registry): Use
> 	back_jt_path_registry.
> 	* tree-ssa-threadedge.c (jump_threader::jump_threader): Use
> 	fwd_jt_path_registry.
> 	* tree-ssa-threadedge.h (class jump_threader): Same..
> 	* tree-ssa-threadupdate.c
> 	(jump_thread_path_registry::jump_thread_path_registry): Rename...
> 	(jt_path_registry::jt_path_registry): ...to this.
> 	(jump_thread_path_registry::~jump_thread_path_registry): Rename...
> 	(jt_path_registry::~jt_path_registry): ...this.
> 	(fwd_jt_path_registry::fwd_jt_path_registry): New.
> 	(fwd_jt_path_registry::~fwd_jt_path_registry): New.
> 	(jump_thread_path_registry::allocate_thread_edge): Rename...
> 	(jt_path_registry::allocate_thread_edge): ...to this.
> 	(jump_thread_path_registry::allocate_thread_path): Rename...
> 	(jt_path_registry::allocate_thread_path): ...to this.
> 	(jump_thread_path_registry::lookup_redirection_data): Rename...
> 	(fwd_jt_path_registry::lookup_redirection_data): ...to this.
> 	(jump_thread_path_registry::thread_block_1): Rename...
> 	(fwd_jt_path_registry::thread_block_1): ...to this.
> 	(jump_thread_path_registry::thread_block): Rename...
> 	(fwd_jt_path_registry::thread_block): ...to this.
> 	(jt_path_registry::thread_through_loop_header): Rename...
> 	(fwd_jt_path_registry::thread_through_loop_header): ...to this.
> 	(jump_thread_path_registry::mark_threaded_blocks): Rename...
> 	(fwd_jt_path_registry::mark_threaded_blocks): ...to this.
> 	(jump_thread_path_registry::debug_path): Rename...
> 	(jt_path_registry::debug_path): ...to this.
> 	(jump_thread_path_registry::dump): Rename...
> 	(jt_path_registry::debug): ...to this.
> 	(jump_thread_path_registry::rewire_first_differing_edge): Rename...
> 	(back_jt_path_registry::rewire_first_differing_edge): ...to this.
> 	(jump_thread_path_registry::adjust_paths_after_duplication): Rename...
> 	(back_jt_path_registry::adjust_paths_after_duplication): ...to this.
> 	(jump_thread_path_registry::duplicate_thread_path): Rename...
> 	(back_jt_path_registry::duplicate_thread_path): ...to this.  Also,
> 	drop ill-formed candidates.
> 	(jump_thread_path_registry::remove_jump_threads_including): Rename...
> 	(fwd_jt_path_registry::remove_jump_threads_including): ...to this.
> 	(jt_path_registry::thread_through_all_blocks): New.
> 	(back_jt_path_registry::update_cfg): New.
> 	(fwd_jt_path_registry::update_cfg): New.
> 	(jump_thread_path_registry::register_jump_thread): Rename...
> 	(jt_path_registry::register_jump_thread): ...to this.
> 	* tree-ssa-threadupdate.h (class jump_thread_path_registry):
> 	Abstract to...
> 	(class jt_path_registry): ...here.
> 	(class fwd_jt_path_registry): New.
> 	(class back_jt_path_registry): New.
OK.  I wouldn't be terribly surprised if there's further cleanups we can 
do, but this is clearly a step in the right direction.

jeff

ps.  Don't forget about gcc.dg/torture/pr55107 regression I sent you 
yesterday.  It's showing up on a wide range of targets.
Aldy Hernandez Sept. 11, 2021, 6:01 p.m. UTC | #2
> So another thing to consider is that the threaders initially record 
> their paths in different directions.  Forward threading records starting 
> at the first block, backward from the final block.  At some point (I no 
> longer remember where) we invert the backwards threader's path to fit 
> the model that the registry wanted and I think we then convert the path 
> into the structure that the generic copier wants at some later point.

Yeah, that's in back_threader_registry::register_path, and then we go 
back and massage things again in back_jt_path_registry::update_cfg().

> 
> In theory with the two better separated we might not need to do the 
> conversions of the backwards threader's paths anymore.

Good point.  I'll think about it, as we need some coherent way of 
keeping track of paths that can we can share, as I'm about to add yet 
another one in the jt_state business in tree-ssa-threadege.h.

> ps.  Don't forget about gcc.dg/torture/pr55107 regression I sent you 
> yesterday.  It's showing up on a wide range of targets.

Yeah, thanks for mentioning it.  I'm putting it aside until Monday when 
I'm actually getting paid to chase tricky bugs :).

Thanks for the review.
Aldy
Jeff Law Sept. 13, 2021, 1:36 p.m. UTC | #3
On 9/11/2021 12:01 PM, Aldy Hernandez wrote:
>
>> So another thing to consider is that the threaders initially record 
>> their paths in different directions. Forward threading records 
>> starting at the first block, backward from the final block.  At some 
>> point (I no longer remember where) we invert the backwards threader's 
>> path to fit the model that the registry wanted and I think we then 
>> convert the path into the structure that the generic copier wants at 
>> some later point.
>
> Yeah, that's in back_threader_registry::register_path, and then we go 
> back and massage things again in back_jt_path_registry::update_cfg().
>
>>
>> In theory with the two better separated we might not need to do the 
>> conversions of the backwards threader's paths anymore.
>
> Good point.  I'll think about it, as we need some coherent way of 
> keeping track of paths that can we can share, as I'm about to add yet 
> another one in the jt_state business in tree-ssa-threadege.h.
>
>> ps.  Don't forget about gcc.dg/torture/pr55107 regression I sent you 
>> yesterday.  It's showing up on a wide range of targets.
>
> Yeah, thanks for mentioning it.  I'm putting it aside until Monday 
> when I'm actually getting paid to chase tricky bugs :).
Fair enough :-)  Works for me.  I've got a way to disable the test 
trivially until we sort out whether or not it's valid and/or useful.

jeff
diff mbox series

Patch

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index e72992328de..7ff5cecbdab 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -56,7 +56,7 @@  public:
   bool register_path (const vec<basic_block> &, edge taken);
   bool thread_through_all_blocks (bool may_peel_loop_headers);
 private:
-  jump_thread_path_registry m_lowlevel_registry;
+  back_jt_path_registry m_lowlevel_registry;
   const int m_max_allowable_paths;
   int m_threaded_paths;
 };
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 3c7cdc58b93..422cb89401b 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -71,7 +71,7 @@  jump_threader::jump_threader (jump_threader_simplifier *simplifier,
   dummy_cond = gimple_build_cond (NE_EXPR, integer_zero_node,
 				  integer_zero_node, NULL, NULL);
 
-  m_registry = new jump_thread_path_registry ();
+  m_registry = new fwd_jt_path_registry ();
   m_simplifier = simplifier;
   m_state = state;
 }
diff --git a/gcc/tree-ssa-threadedge.h b/gcc/tree-ssa-threadedge.h
index 0002b200d8b..18e6bd41aaa 100644
--- a/gcc/tree-ssa-threadedge.h
+++ b/gcc/tree-ssa-threadedge.h
@@ -75,7 +75,7 @@  private:
   // Dummy condition to avoid creating lots of throw away statements.
   gcond *dummy_cond;
 
-  class jump_thread_path_registry *m_registry;
+  class fwd_jt_path_registry *m_registry;
   jump_threader_simplifier *m_simplifier;
   jt_state *m_state;
 };
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 18f16efbb7a..93538104fdf 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -167,29 +167,36 @@  jump_thread_path_allocator::allocate_thread_path ()
   return new (r) vec<jump_thread_edge *> ();
 }
 
-jump_thread_path_registry::jump_thread_path_registry ()
+jt_path_registry::jt_path_registry ()
 {
   m_paths.create (5);
-  m_removed_edges = new hash_table<struct removed_edges> (17);
   m_num_threaded_edges = 0;
-  m_redirection_data = NULL;
 }
 
-jump_thread_path_registry::~jump_thread_path_registry ()
+jt_path_registry::~jt_path_registry ()
 {
   m_paths.release ();
+}
+
+fwd_jt_path_registry::fwd_jt_path_registry ()
+{
+  m_removed_edges = new hash_table<struct removed_edges> (17);
+  m_redirection_data = NULL;
+}
+
+fwd_jt_path_registry::~fwd_jt_path_registry ()
+{
   delete m_removed_edges;
 }
 
 jump_thread_edge *
-jump_thread_path_registry::allocate_thread_edge (edge e,
-						 jump_thread_edge_type t)
+jt_path_registry::allocate_thread_edge (edge e, jump_thread_edge_type t)
 {
   return m_allocator.allocate_thread_edge (e, t);
 }
 
 vec<jump_thread_edge *> *
-jump_thread_path_registry::allocate_thread_path ()
+jt_path_registry::allocate_thread_path ()
 {
   return m_allocator.allocate_thread_path ();
 }
@@ -426,8 +433,7 @@  create_block_for_threading (basic_block bb,
    edges associated with E in the hash table.  */
 
 redirection_data *
-jump_thread_path_registry::lookup_redirection_data (edge e,
-						    enum insert_option insert)
+fwd_jt_path_registry::lookup_redirection_data (edge e, insert_option insert)
 {
   struct redirection_data **slot;
   struct redirection_data *elt;
@@ -1413,9 +1419,9 @@  redirection_block_p (basic_block bb)
    If JOINERS is true, then thread through joiner blocks as well.  */
 
 bool
-jump_thread_path_registry::thread_block_1 (basic_block bb,
-					   bool noloop_only,
-					   bool joiners)
+fwd_jt_path_registry::thread_block_1 (basic_block bb,
+				      bool noloop_only,
+				      bool joiners)
 {
   /* E is an incoming edge into BB that we may or may not want to
      redirect to a duplicate of BB.  */
@@ -1594,7 +1600,7 @@  jump_thread_path_registry::thread_block_1 (basic_block bb,
    opportunity.  */
 
 bool
-jump_thread_path_registry::thread_block (basic_block bb, bool noloop_only)
+fwd_jt_path_registry::thread_block (basic_block bb, bool noloop_only)
 {
   bool retval;
   retval = thread_block_1 (bb, noloop_only, false);
@@ -1675,9 +1681,8 @@  determine_bb_domination_status (class loop *loop, basic_block bb)
    to the inside of the loop.  */
 
 bool
-jump_thread_path_registry::thread_through_loop_header
-				(class loop *loop,
-				 bool may_peel_loop_headers)
+fwd_jt_path_registry::thread_through_loop_header (class loop *loop,
+						  bool may_peel_loop_headers)
 {
   basic_block header = loop->header;
   edge e, tgt_edge, latch = loop_latch_edge (loop);
@@ -1932,7 +1937,7 @@  count_stmts_and_phis_in_block (basic_block bb)
    hash table lookups to map from threaded edge to new target.  */
 
 void
-jump_thread_path_registry::mark_threaded_blocks (bitmap threaded_blocks)
+fwd_jt_path_registry::mark_threaded_blocks (bitmap threaded_blocks)
 {
   unsigned int i;
   bitmap_iterator bi;
@@ -2197,7 +2202,7 @@  bb_in_bbs (basic_block bb, basic_block *bbs, int n)
 }
 
 void
-jump_thread_path_registry::debug_path (FILE *dump_file, int pathno)
+jt_path_registry::debug_path (FILE *dump_file, int pathno)
 {
   vec<jump_thread_edge *> *p = m_paths[pathno];
   fprintf (dump_file, "path: ");
@@ -2208,7 +2213,7 @@  jump_thread_path_registry::debug_path (FILE *dump_file, int pathno)
 }
 
 void
-jump_thread_path_registry::dump ()
+jt_path_registry::debug ()
 {
   for (unsigned i = 0; i < m_paths.length (); ++i)
     debug_path (stderr, i);
@@ -2223,8 +2228,8 @@  jump_thread_path_registry::dump ()
    Returns TRUE if we were able to successfully rewire the edge.  */
 
 bool
-jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num,
-							unsigned edge_num)
+back_jt_path_registry::rewire_first_differing_edge (unsigned path_num,
+						    unsigned edge_num)
 {
   vec<jump_thread_edge *> *path = m_paths[path_num];
   edge &e = (*path)[edge_num]->e;
@@ -2269,11 +2274,9 @@  jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num,
    specifies the path that was just threaded.  */
 
 void
-jump_thread_path_registry::adjust_paths_after_duplication
-	(unsigned curr_path_num)
+back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num)
 {
   vec<jump_thread_edge *> *curr_path = m_paths[curr_path_num];
-  gcc_assert ((*curr_path)[0]->type == EDGE_FSM_THREAD);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2347,8 +2350,16 @@  jump_thread_path_registry::adjust_paths_after_duplication
 	      m_paths.unordered_remove (cand_path_num);
 	      continue;
 	    }
-	  /* Otherwise, just remove the redundant sub-path.  */
-	  cand_path->block_remove (0, j);
+	  if ((*cand_path)[j]->type != EDGE_FSM_THREAD)
+	    {
+	      /* If all the EDGE_FSM_THREADs are common, all that's
+		 left is the final EDGE_NO_COPY_SRC_BLOCK.  */
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Dropping illformed candidate.\n");
+	    }
+	  else
+	    /* Otherwise, just remove the redundant sub-path.  */
+	    cand_path->block_remove (0, j);
 	}
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -2372,11 +2383,11 @@  jump_thread_path_registry::adjust_paths_after_duplication
    Returns false if it is unable to copy the region, true otherwise.  */
 
 bool
-jump_thread_path_registry::duplicate_thread_path (edge entry,
-						  edge exit,
-						  basic_block *region,
-						  unsigned n_region,
-						  unsigned current_path_no)
+back_jt_path_registry::duplicate_thread_path (edge entry,
+					      edge exit,
+					      basic_block *region,
+					      unsigned n_region,
+					      unsigned current_path_no)
 {
   unsigned i;
   class loop *loop = entry->dest->loop_father;
@@ -2551,7 +2562,7 @@  valid_jump_thread_path (vec<jump_thread_edge *> *path)
    DOM/VRP rather than for every case where DOM optimizes away a COND_EXPR.  */
 
 void
-jump_thread_path_registry::remove_jump_threads_including (edge_def *e)
+fwd_jt_path_registry::remove_jump_threads_including (edge_def *e)
 {
   if (!m_paths.exists ())
     return;
@@ -2560,69 +2571,52 @@  jump_thread_path_registry::remove_jump_threads_including (edge_def *e)
   *slot = e;
 }
 
-/* Walk through all blocks and thread incoming edges to the appropriate
-   outgoing edge for each edge pair recorded in THREADED_EDGES.
+/* Thread all paths that have been queued for jump threading, and
+   update the CFG accordingly.
 
    It is the caller's responsibility to fix the dominance information
    and rewrite duplicated SSA_NAMEs back into SSA form.
 
-   If MAY_PEEL_LOOP_HEADERS is false, we avoid threading edges through
-   loop headers if it does not simplify the loop.
+   If PEEL_LOOP_HEADERS is false, avoid threading edges through loop
+   headers if it does not simplify the loop.
 
-   Returns true if one or more edges were threaded, false otherwise.  */
+   Returns true if one or more edges were threaded.  */
 
 bool
-jump_thread_path_registry::thread_through_all_blocks
-	(bool may_peel_loop_headers)
+jt_path_registry::thread_through_all_blocks (bool peel_loop_headers)
 {
-  bool retval = false;
-  unsigned int i;
-  auto_bitmap threaded_blocks;
-  hash_set<edge> visited_starting_edges;
-
-  if (!m_paths.exists ())
-    {
-      retval = false;
-      goto out;
-    }
+  if (m_paths.length () == 0)
+    return false;
 
   m_num_threaded_edges = 0;
 
-  /* Remove any paths that referenced removed edges.  */
-  if (m_removed_edges)
-    for (i = 0; i < m_paths.length (); )
-      {
-	unsigned int j;
-	vec<jump_thread_edge *> *path = m_paths[i];
+  bool retval = update_cfg (peel_loop_headers);
 
-	for (j = 0; j < path->length (); j++)
-	  {
-	    edge e = (*path)[j]->e;
-	    if (m_removed_edges->find_slot (e, NO_INSERT))
-	      break;
-	  }
+  statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges);
 
-	if (j != path->length ())
-	  {
-	    cancel_thread (path, "Thread references removed edge");
-	    m_paths.unordered_remove (i);
-	    continue;
-	  }
-	i++;
-      }
+  if (retval)
+    {
+      loops_state_set (LOOPS_NEED_FIXUP);
+      return true;
+    }
+  return false;
+}
 
-  /* Jump-thread all FSM threads before other jump-threads.  */
-  for (i = 0; i < m_paths.length ();)
+/* This is the backward threader version of thread_through_all_blocks
+   using a generic BB copier.  */
+
+bool
+back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/)
+{
+  bool retval = false;
+  hash_set<edge> visited_starting_edges;
+
+  while (m_paths.length ())
     {
-      vec<jump_thread_edge *> *path = m_paths[i];
+      vec<jump_thread_edge *> *path = m_paths[0];
       edge entry = (*path)[0]->e;
 
-      /* Only code-generate FSM jump-threads in this loop.  */
-      if ((*path)[0]->type != EDGE_FSM_THREAD)
-	{
-	  i++;
-	  continue;
-	}
+      gcc_checking_assert ((*path)[0]->type == EDGE_FSM_THREAD);
 
       /* Do not jump-thread twice from the same starting edge.
 
@@ -2638,8 +2632,8 @@  jump_thread_path_registry::thread_through_all_blocks
 	  || !valid_jump_thread_path (path))
 	{
 	  /* Remove invalid FSM jump-thread paths.  */
-	  cancel_thread (path, "Invalid FSM jump-thread path");
-	  m_paths.unordered_remove (i);
+	  cancel_thread (path, "Avoiding threading twice from same edge");
+	  m_paths.unordered_remove (0);
 	  continue;
 	}
 
@@ -2650,7 +2644,7 @@  jump_thread_path_registry::thread_through_all_blocks
       for (unsigned int j = 0; j < len - 1; j++)
 	region[j] = (*path)[j]->e->dest;
 
-      if (duplicate_thread_path (entry, exit, region, len - 1, i))
+      if (duplicate_thread_path (entry, exit, region, len - 1, 0))
 	{
 	  /* We do not update dominance info.  */
 	  free_dominance_info (CDI_DOMINATORS);
@@ -2660,27 +2654,44 @@  jump_thread_path_registry::thread_through_all_blocks
 	}
 
       path->release ();
-      m_paths.unordered_remove (i);
+      m_paths.unordered_remove (0);
       free (region);
     }
+  return retval;
+}
 
-  /* Remove from PATHS all the jump-threads starting with an edge already
-     jump-threaded.  */
-  for (i = 0; i < m_paths.length ();)
-    {
-      vec<jump_thread_edge *> *path = m_paths[i];
-      edge entry = (*path)[0]->e;
+/* This is the forward threader version of thread_through_all_blocks,
+   using a custom BB copier.  */
 
-      /* Do not jump-thread twice from the same block.  */
-      if (visited_starting_edges.contains (entry))
-	{
-	  cancel_thread (path, "Avoiding threading twice from same BB");
-	  m_paths.unordered_remove (i);
-	}
-      else
+bool
+fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers)
+{
+  bool retval = false;
+
+  /* Remove any paths that referenced removed edges.  */
+  if (m_removed_edges)
+    for (unsigned i = 0; i < m_paths.length (); )
+      {
+	unsigned int j;
+	vec<jump_thread_edge *> *path = m_paths[i];
+
+	for (j = 0; j < path->length (); j++)
+	  {
+	    edge e = (*path)[j]->e;
+	    if (m_removed_edges->find_slot (e, NO_INSERT))
+	      break;
+	  }
+
+	if (j != path->length ())
+	  {
+	    cancel_thread (path, "Thread references removed edge");
+	    m_paths.unordered_remove (i);
+	    continue;
+	  }
 	i++;
-    }
+      }
 
+  auto_bitmap threaded_blocks;
   mark_threaded_blocks (threaded_blocks);
 
   initialize_original_copy_tables ();
@@ -2737,16 +2748,8 @@  jump_thread_path_registry::thread_through_all_blocks
 	gcc_assert (e->aux == NULL);
     }
 
-  statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges);
-
   free_original_copy_tables ();
 
-  m_paths.release ();
-
-  if (retval)
-    loops_state_set (LOOPS_NEED_FIXUP);
-
- out:
   return retval;
 }
 
@@ -2761,7 +2764,7 @@  jump_thread_path_registry::thread_through_all_blocks
    Return TRUE if PATH was successfully threaded.  */
 
 bool
-jump_thread_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
+jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
 {
   if (!dbg_cnt (registered_jump_thread))
     {
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 2030bda15af..58e3a38e0c5 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -54,49 +54,65 @@  private:
   obstack m_obstack;
 };
 
-// This is the underlying jump thread registry.  When all candidates
-// have been registered with register_jump_thread(),
-// thread_through_all_blocks() is called to actually change the CFG.
+// Abstract class for the jump thread registry.
+//
+// When all candidates have been registered with
+// register_jump_thread(), thread_through_all_blocks() is called to
+// update the CFG.
 
-class jump_thread_path_registry
+class jt_path_registry
 {
 public:
-  jump_thread_path_registry ();
-  ~jump_thread_path_registry ();
+  jt_path_registry ();
+  virtual ~jt_path_registry ();
   bool register_jump_thread (vec<jump_thread_edge *> *);
-  void remove_jump_threads_including (edge);
-  bool thread_through_all_blocks (bool);
+  bool thread_through_all_blocks (bool peel_loop_headers);
   jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
   vec<jump_thread_edge *> *allocate_thread_path ();
-  void dump ();
+  void debug ();
+protected:
+  void debug_path (FILE *, int pathno);
+  vec<vec<jump_thread_edge *> *> m_paths;
+  unsigned long m_num_threaded_edges;
+private:
+  virtual bool update_cfg (bool peel_loop_headers) = 0;
+  jump_thread_path_allocator m_allocator;
+  DISABLE_COPY_AND_ASSIGN (jt_path_registry);
+};
+
+// Forward threader path registry using a custom BB copier.
 
+class fwd_jt_path_registry : public jt_path_registry
+{
+public:
+  fwd_jt_path_registry ();
+  ~fwd_jt_path_registry ();
+  void remove_jump_threads_including (edge);
 private:
-  void debug_path (FILE *, int pathno);
+  bool update_cfg (bool peel_loop_headers) override;
   void mark_threaded_blocks (bitmap threaded_blocks);
-  bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
-  void adjust_paths_after_duplication (unsigned curr_path_num);
-  bool duplicate_thread_path (edge entry,
-			      edge exit,
-			      basic_block *region,
-			      unsigned n_region,
-			      unsigned current_path_no);
   bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
   bool thread_block (basic_block, bool noloop_only);
   bool thread_through_loop_header (class loop *loop,
 				   bool may_peel_loop_headers);
   class redirection_data *lookup_redirection_data (edge e, enum insert_option);
 
-  vec<vec<jump_thread_edge *> *> m_paths;
-
   hash_table<struct removed_edges> *m_removed_edges;
 
   // Main data structure to hold information for duplicates of BB.
   hash_table<redirection_data> *m_redirection_data;
+};
 
-  // Jump threading statistics.
-  unsigned long m_num_threaded_edges;
+// Backward threader path registry using a generic BB copier.
 
-  jump_thread_path_allocator m_allocator;
+class back_jt_path_registry : public jt_path_registry
+{
+private:
+  bool update_cfg (bool peel_loop_headers) override;
+  void adjust_paths_after_duplication (unsigned curr_path_num);
+  bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
+			      unsigned n_region, unsigned current_path_no);
+  bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
 };
 
 // Rather than search all the edges in jump thread paths each time DOM