diff mbox

Improving jump-thread pass for PR 54742

Message ID 20141026213433.GA22140@f1.c.bardezibar.internal
State New
Headers show

Commit Message

Sebastian Pop Oct. 26, 2014, 9:34 p.m. UTC
Sebastian Pop wrote:
> Jeff Law wrote:
> > On 08/21/14 04:30, Richard Biener wrote:
> > >>It turns Jeff's jump-threading code in to a strange franken-pass of bits and
> > >>pieces of detection and optimisation, and would need some substantial
> > >>reworking to fit in with Jeff's changes last Autumn, but if it is more
> > >>likely to be acceptable for trunk then perhaps we could look to revive it.
> > >>It would be nice to reuse the path copy code Jeff added last year, but I
> > >>don't have much intuition as to how feasible that is.
> > >>
> > >>Was this the sort of thing that you were imagining?
> > >
> > >Yeah, didn't look too closely though.
> > It'd be pretty ugly I suspect.  But it's probably worth pondering
> > since that approach would eliminate the concerns about the cost of
> > detection (which is problematical for the jump threader) by using
> > Steve's code for that.
> > 
> > On the update side, I suspect most, if not all of the framework is
> > in place to handle this kind of update if the right threading paths
> > were passed to the updater.  I can probably cobble together that
> > by-hand and see what the tree-ssa-threadupdate does with it.  But
> > it'll be a week or so before I could look at it.
> 
> I adapted the patch James has sent last year to use the new update paths

Attached an updated version of the patch.

> mechanism.  I verified that the attached patch does register all the paths that
> need to be threaded.  Thread updater seems to have some problems handling the
> attached testcase (a simplified version of the testcase attached to the bug.)
> 
> Jeff, could you please have a look at why the jump-thread updater is crashing?

I have tried to understand why the code generation part ICEs on coremark: the
first problem that I have seen is that tree-ssa-threadupdate.c does not handle
more than a joiner block per path to be threaded, so we would not be able to
jump thread accross the joiners of the if condition and the joiner of the switch
condition: i.e., these paths

patch:   Registering jump thread: (7, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
patch:   Registering jump thread: (28, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 11) nocopy;
patch:   Registering jump thread: (8, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
patch:   Registering jump thread: (9, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;

Another problem is that we attach the path to be threaded to the ->aux field of
the first edge in the path, such that we would have to cancel some of the paths
because we cannot keep track of all the paths to be threaded.

For coremark, we would discover some jump-thread paths from one of the switch
cases over the loop exit condition, either to bb_27 outside the loop, or to bb_4
staying inside the loop.  Then with the "patch:" we would discover jump threads
that would thread switch cases to switch cases, and because these paths start
with the same edges for which we have already assigned a path to e->aux, we
would have to cancel the interesting threads added by the patch:

  Registering jump thread: (12, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (13, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
  Registering jump thread: (29, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (31, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
  Registering jump thread: (16, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (15, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (32, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
  Registering jump thread: (19, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (18, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (22, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
  Registering jump thread: (21, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (34, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
  Registering jump thread: (33, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
  Registering jump thread: (35, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
  Registering jump thread: (24, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
patch:   Registering jump thread: (12, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
patch:   Registering jump thread: (16, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
patch:   Registering jump thread: (19, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
patch:   Registering jump thread: (22, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
patch:   Registering jump thread: (34, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
patch:   Registering jump thread: (35, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
patch:   Registering jump thread: (29, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
patch:   Registering jump thread: (13, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
patch:   Registering jump thread: (15, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
patch:   Registering jump thread: (31, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
patch:   Registering jump thread: (18, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 20) nocopy;
patch:   Registering jump thread: (32, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
patch:   Registering jump thread: (21, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 23) nocopy;
patch:   Registering jump thread: (33, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;
patch:   Registering jump thread: (24, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;
  Registering jump thread: (6, 36) incoming edge;  (36, 7) normal;
  Cancelling jump thread: (12, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
  Cancelling jump thread: (16, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
  Cancelling jump thread: (19, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
  Cancelling jump thread: (22, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
  Cancelling jump thread: (34, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
  Cancelling jump thread: (35, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
  Cancelling jump thread: (29, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
  Cancelling jump thread: (13, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
  Cancelling jump thread: (15, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
  Cancelling jump thread: (31, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
  Cancelling jump thread: (18, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 20) nocopy;
  Cancelling jump thread: (32, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
  Cancelling jump thread: (21, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 23) nocopy;
  Cancelling jump thread: (33, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;
  Cancelling jump thread: (24, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;

Here is the structure of the CFG with the loops:

(gdb) p debug_loops (2)
loop_0 (header = 0, latch = 1, niter = )
{
  bb_2 (preds = {bb_0 }, succs = {bb_3 bb_27 })
  bb_3 (preds = {bb_2 }, succs = {bb_5 bb_6 })
  bb_5 (preds = {bb_4 bb_3 }, succs = {bb_27 })
  bb_6 (preds = {bb_3 }, succs = {bb_36 })
  bb_27 (preds = {bb_5 bb_25 bb_26 bb_2 }, succs = {bb_1 })
  loop_1 (header = 36, latch = 37, niter = )
  {
    bb_4 (preds = {bb_26 }, succs = {bb_5 bb_37 })
    bb_37 (preds = {bb_4 }, succs = {bb_36 })
    bb_36 (preds = {bb_6 bb_37 }, succs = {bb_25 bb_7 bb_11 bb_20 bb_14 bb_17 bb_23 bb_24 })
    bb_7 (preds = {bb_36 }, succs = {bb_10 bb_28 })
    bb_8 (preds = {bb_28 }, succs = {bb_10 bb_9 })
    bb_9 (preds = {bb_8 }, succs = {bb_10 })
    bb_10 (preds = {bb_7 bb_28 bb_8 bb_9 }, succs = {bb_25 })
    bb_11 (preds = {bb_36 }, succs = {bb_29 bb_30 })
    bb_12 (preds = {bb_30 }, succs = {bb_25 })
    bb_13 (preds = {bb_30 }, succs = {bb_25 })
    bb_14 (preds = {bb_36 }, succs = {bb_15 bb_16 })
    bb_15 (preds = {bb_14 }, succs = {bb_25 })
    bb_16 (preds = {bb_14 }, succs = {bb_25 bb_31 })
    bb_17 (preds = {bb_36 }, succs = {bb_18 bb_19 })
    bb_18 (preds = {bb_17 }, succs = {bb_25 })
    bb_19 (preds = {bb_17 }, succs = {bb_25 bb_32 })
    bb_20 (preds = {bb_36 }, succs = {bb_21 bb_22 })
    bb_21 (preds = {bb_20 }, succs = {bb_25 })
    bb_22 (preds = {bb_20 }, succs = {bb_25 })
    bb_23 (preds = {bb_36 }, succs = {bb_33 bb_34 })
    bb_24 (preds = {bb_36 }, succs = {bb_25 bb_35 })
    bb_25 (preds = {bb_10 bb_12 bb_16 bb_19 bb_22 bb_34 bb_35 bb_36 bb_29 bb_13 bb_15 bb_31 bb_18 bb_32 bb_21 bb_33 bb_24 }, succs = {bb_26 bb_27 })
    bb_26 (preds = {bb_25 }, succs = {bb_4 bb_27 })
    bb_28 (preds = {bb_7 }, succs = {bb_10 bb_8 })
    bb_29 (preds = {bb_11 }, succs = {bb_25 })
    bb_30 (preds = {bb_11 }, succs = {bb_12 bb_13 })
    bb_31 (preds = {bb_16 }, succs = {bb_25 })
    bb_32 (preds = {bb_19 }, succs = {bb_25 })
    bb_33 (preds = {bb_23 }, succs = {bb_25 })
    bb_34 (preds = {bb_23 }, succs = {bb_25 })
    bb_35 (preds = {bb_24 }, succs = {bb_25 })
  }
}

What about removing the use of e->aux in threadupdate.c, to be able to jump
thread across all the recorded paths?

Thanks,
Sebastian

Comments

Jeff Law Nov. 19, 2014, 10:24 p.m. UTC | #1
On 10/26/14 15:34, Sebastian Pop wrote:
>
> I have tried to understand why the code generation part ICEs on coremark: the
> first problem that I have seen is that tree-ssa-threadupdate.c does not handle
> more than a joiner block per path to be threaded, so we would not be able to
> jump thread accross the joiners of the if condition and the joiner of the switch
> condition: i.e., these paths
Right.  There's nothing I can think of inherently that makes that 
impossible, it's just not something the code currently tries to support. 
  I suspect there's a few places that need light generalization.  I can 
offhand think of one or two.

It's not lost on me that what we're building is a specialized region 
cloner.  I keep wanting to go back and see if there's a representation 
where we have an incoming edge, series of blocks and an outgoing edge. 
The concept of a "join" block really isn't important.  It's just a block 
that we want to copy for which we don't know its destination.

Anyway, within that representation, I think the way to wire up the edges 
is simple if we build a map at the beginning of the process.   The set 
of blocks in the path all get clones.  There's a single edge into the 
cloned path (the incoming edge) and a single edge out (the edge 
corresponding to the statically computed destination of the path). 
Edges that were interior in the original path are kept interior in the 
clone.  Edges that reached outside the original path go from the clone 
to the original destinations outside the path.  It's a SEME region.


>
> Another problem is that we attach the path to be threaded to the ->aux field of
> the first edge in the path, such that we would have to cancel some of the paths
> because we cannot keep track of all the paths to be threaded.
What I can easily see is cases where you have two paths starting at a 
particular node where one path is a strict subset of another path. 
Assuming the amount of copying we're doing is reasonable, then you'd 
want to keep just the longer path

But I don't think that's the case you're struggling with.  We could (for 
example) have a case where all the successors of a join block become 
threadable, possibly to different destinations.

That would argue that we really want to handle the threads off a 
different data structure than e->aux.


jeff
diff mbox

Patch

From bac0f2a390048652910f77503b21b3e208daeae1 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <s.pop@samsung.com>
Date: Fri, 26 Sep 2014 14:54:20 -0500
Subject: [PATCH] jump thread for PR 54742

Adapted from a patch from James Greenhalgh.

	* tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the
	original value of cond when simplification fails.
	(find_thread_path): New.
	(find_control_statement_thread_paths): New.
	(thread_through_normal_block): Call find_control_statement_thread_paths.

	* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c |  32 ++++
 gcc/tree-ssa-threadedge.c                        | 180 ++++++++++++++++++++++-
 gcc/tree-ssa-threadupdate.c                      |   4 +
 3 files changed, 215 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
new file mode 100644
index 0000000..f3ef725
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -0,0 +1,32 @@ 
+int sum0, sum1, sum2, sum3;
+int foo(char * s, char** ret)
+{
+  int state=0;
+  char c;
+
+  for (; *s && state != 4; s++)
+    {
+      c = *s;
+      if (c == '*')
+	{
+	  s++;
+	  break;
+	}
+      switch (state) {
+	case 0:
+	  if (c == '+') state = 1;
+	  else if (c != '-') sum0+=c;
+	  break;
+	case 1:
+	  if (c == '+') state = 2;
+	  else if (c == '-') state = 0;
+	  else sum1+=c;
+	  break;
+	default:
+	  break;
+      }
+
+    }
+  *ret = s;
+  return state;
+}
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 3dee5ba..7b9e5b6 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -628,6 +628,7 @@  simplify_control_stmt_condition (edge e,
      rather than use a relational operator.  These are simpler to handle.  */
   if (TREE_CODE (cond) == SSA_NAME)
     {
+      tree original_lhs = cond;
       cached_lhs = cond;
 
       /* Get the variable's current value from the equivalence chains.
@@ -656,6 +657,12 @@  simplify_control_stmt_condition (edge e,
 	 pass specific callback to try and simplify it further.  */
       if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
         cached_lhs = (*simplify) (stmt, stmt);
+
+      /* We couldn't find an invariant.  But, callers of this
+	 function may be able to do something useful with the
+	 unmodified destination.  */
+      if (!cached_lhs)
+	cached_lhs = original_lhs;
     }
   else
     cached_lhs = NULL;
@@ -915,6 +922,155 @@  thread_around_empty_blocks (edge taken_edge,
   return false;
 }
 
+/* Return true if there is at least one path from START_BB to END_BB.
+   VISITED_BBS is used to make sure we don't fall into an infinite loop.  */
+
+static bool
+find_thread_path (basic_block start_bb, basic_block end_bb,
+		    vec<basic_block, va_gc> *&path,
+		    hash_set<basic_block> *visited_bbs)
+{
+  if (start_bb == end_bb)
+    {
+      vec_safe_push (path, start_bb);
+      return true;
+    }
+
+  if (!visited_bbs->add(start_bb))
+    {
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, start_bb->succs)
+	if (find_thread_path (e->dest, end_bb, path, visited_bbs))
+	  {
+	    vec_safe_push (path, start_bb);
+	    return true;
+	  }
+    }
+
+  return false;
+}
+
+/* We trace the value of the variable EXPR back through any phi nodes looking
+   for places where it gets a constant value and save the path.  */
+
+static void
+find_control_statement_thread_paths (tree expr,
+				     hash_set<gimple> *visited_phis,
+				     vec<basic_block, va_gc> *&path)
+{
+  tree var = SSA_NAME_VAR (expr);
+  gimple def_stmt = SSA_NAME_DEF_STMT (expr);
+  basic_block var_bb = gimple_bb (def_stmt);
+
+  if (var == NULL || var_bb == NULL)
+    return;
+
+  vec<basic_block, va_gc> *next_path;
+  vec_alloc (next_path, n_basic_blocks_for_fn (cfun));
+
+  basic_block last_bb_in_path = path->last ();
+
+  /* Put the path from var_bb to last_bb_in_path into next_path.  */
+  if (var_bb != last_bb_in_path)
+    {
+      edge e;
+      int e_count = 0;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (e, ei, last_bb_in_path->preds)
+	{
+	  hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
+
+	  if (find_thread_path (var_bb, e->src, next_path, visited_bbs))
+	    e_count = e_count + 1;
+
+	  delete visited_bbs;
+
+	  /* If there is more than one path, stop.  */
+	  if (e_count > 1)
+	    {
+	      vec_free (next_path);
+	      return;
+	    }
+	}
+    }
+
+  /* Visit PHI nodes once.  */
+  if (gimple_code (def_stmt) != GIMPLE_PHI
+      || visited_phis->add(def_stmt)) {
+    vec_free (next_path);
+    return;
+  }
+
+  /* Append all the nodes from next_path to path.  */
+  vec_safe_splice (path, next_path);
+  gcc_assert (path->last () == var_bb);
+
+  /* Iterate over the arguments of PHI.  */
+  unsigned int i;
+  for (i = 0; i < gimple_phi_num_args (def_stmt); i++)
+    {
+      tree arg = gimple_phi_arg_def (def_stmt, i);
+      basic_block bbi = gimple_phi_arg_edge (def_stmt, i)->src;
+
+      /* Skip edges pointing outside the current loop.  */
+      if (!arg || var_bb->loop_father != bbi->loop_father)
+	continue;
+
+      /* Add BBI to the path.  */
+      vec_safe_push (path, bbi);
+
+      if (TREE_CODE (arg) == INTEGER_CST)
+	{
+	  int j, n = path->length ();
+	  vec<jump_thread_edge *> *jump_thread_path
+	    = new vec<jump_thread_edge *> ();
+	  int joiners = 0;
+
+	  for (j = 0; j < n - 1; j++)
+	    {
+	      edge e = find_edge ((*path)[n - j - 1],
+				  (*path)[n - j - 2]);
+	      gcc_assert (e);
+	      enum jump_thread_edge_type kind;
+
+	      if (j == 0)
+		kind = EDGE_START_JUMP_THREAD;
+	      else if (single_pred_p (e->src))
+		kind = EDGE_NO_COPY_SRC_BLOCK;
+	      else {
+		kind = EDGE_COPY_SRC_JOINER_BLOCK;
+		++joiners;
+	      }
+
+	      jump_thread_edge *x = new jump_thread_edge (e, kind);
+	      jump_thread_path->safe_push (x);
+	    }
+
+	  /* Add the edge taken when the control variable has value ARG.  */
+	  edge taken_edge = find_taken_edge ((*path)[0], arg);
+	  jump_thread_edge *x
+	    = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK);
+	  jump_thread_path->safe_push (x);
+
+	  /* Thread-update does not handle more than two joiners.  A path with
+	     less than 3 nodes should not be jump-threaded.  */
+	  if (joiners < 2 && n > 2)
+	    register_jump_thread (jump_thread_path);
+	}
+      else if (TREE_CODE (arg) == SSA_NAME)
+	find_control_statement_thread_paths (arg, visited_phis, path);
+
+      /* Remove BBI from the path.  */
+      path->pop ();
+    }
+
+  /* Remove all the nodes that we added from next_path.  */
+  vec_safe_truncate (path, (path->length () - next_path->length ()));
+  vec_free (next_path);
+}
+
 /* We are exiting E->src, see if E->dest ends with a conditional
    jump which has a known value when reached via E.
 
@@ -1000,7 +1156,10 @@  thread_through_normal_block (edge e,
       cond = simplify_control_stmt_condition (e, stmt, dummy_cond, simplify,
 					      handle_dominating_asserts);
 
-      if (cond && is_gimple_min_invariant (cond))
+      if (!cond)
+	return 0;
+
+      if (is_gimple_min_invariant (cond))
 	{
 	  edge taken_edge = find_taken_edge (e->dest, cond);
 	  basic_block dest = (taken_edge ? taken_edge->dest : NULL);
@@ -1046,7 +1205,25 @@  thread_through_normal_block (edge e,
 				      backedge_seen_p);
 	  return 1;
 	}
+
+      if (TREE_CODE (cond) != SSA_NAME
+	  || e->dest->loop_father != e->src->loop_father)
+	return 0;
+
+      /* When COND cannot be simplified, try to find paths from a control
+	 statement back through the PHI nodes which would affect that control
+	 statement.  */
+      vec<basic_block, va_gc> *bb_path;
+      vec_alloc (bb_path, n_basic_blocks_for_fn (cfun));
+      vec_safe_push (bb_path, e->dest);
+      hash_set<gimple> *visited_phis = new hash_set<gimple>;
+
+      find_control_statement_thread_paths (cond, visited_phis, bb_path);
+
+      delete visited_phis;
+      vec_free (bb_path);
+
+      return -1;
     }
   return 0;
 }
-- 
2.1.0.243.g30d45f7