PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)
diff mbox

Message ID 59457e50-6025-4efa-d039-72642cc0b904@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Feb. 9, 2017, 6:08 p.m. UTC
For those of you not following the PR, this is a re-post of a patch that 
was approved in some form a year+ ago, but was reverted because it 
caused an undiagnosed bootstrap problem on AIX:

https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00421.html

Annoyingly, the AIX problem disappeared with some unrelated changes to 
SCCVN by Richard in more recent GCC's.  Consequently, the patch got 
blocked because we could no longer reproduce the problem, but it wasn't 
entirely clear that it was gone.

As penance for sins in a previous life, I have taken it upon myself to 
reproduce this problem back in time and find what caused the AIX failure 
back then.  I will spare the list the boring process, but the problem 
with the original patch was that it changed the semantics of 
check_simple_exit, but not all of it's indirect callers.  This caused 
two versions of the same loop to have different unsynchronized iteration 
variables-- one in a simple register, and one in a doloop variant on ppc.

I have bootstrapped this patch around trunk@226811 on AIX (all 
languages), which is the latest time AIX failed to bootstrap with the 
original patch.  My testing environment included a handful of other 
unrelated changes that were required to coerce AIX into building 
GCC5-ish with GCC6.1.

I have also bootstrapped and tested this patch at today's trunk on 
x86-64 Linux with no adverse effects.

As I've mentioned, I believe the original patch was previously approved, 
so to aid in reviewing I am including the full patch ("full-patch") with 
my changes on top of the original patch, as well as an incremental patch 
("incremental-patch") with my recent changes to bring AIX to happiness.

I understand if changes to the RTL looping infrastructure are too late 
for this release cycle, and as I am not the original author-- don't kill 
the messenger!  I just want to move this forward, even if it means 
getting approval for GCC8.  This will at the very least allow me to 
sleep at night, knowing I won't have to touch AIX for a while (or at 
least wrt to this PR) ;-).

Original author(s) CC'ed.

Bring it!
Aldy
PR rtl-optimization/64081
	* loop-iv.c (def_pred_latch_p): New function.
	(latch_dominating_def): Allow specific cases with non-single
	definitions.
	(iv_get_reaching_def): Likewise.
	(check_complex_exit_p): New function.
	(check_simple_exit): Use check_complex_exit_p to allow certain cases
	with exits not executing on any iteration.
	Rename to check_loop_exit.
	(find_simple_exit): Rename to find_loop_exit.
	(get_simple_loop_desc): Rename to get_loop_desc.
	* cfgloop.c (get_loop_location): Use get_loop_desc instead of
	get_simple_loop_desc.
	* cfgloop.h (enum loop_exit_type): New.
	(find_loop_exit): New prototype.
	(find_simple_exit): Define as inline to use find_loop_exit.
	(get_loop_desc): New prototype.
	(get_simple_loop_desc): Define as inline to use get_loop_desc.
commit 08859c7279bd159a0a9fe579ae02a5ffd68d34b9
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Thu Feb 9 08:15:10 2017 -0500

    Changes on top of the original patch for 64081 that fix the AIX
    bootstrap failure.  This patch limits the use of check_complex_exit_p
    to select uses of check_simple_exit to avoid generating doloops
    incorrectly.
    
            PR rtl-optimization/64081
            * cfgloop.c (get_loop_location): Use get_loop_desc instead of
            get_simple_loop_desc.
            * cfgloop.h (enum loop_exit_type): New.
            (find_loop_exit): New prototype.
            (find_simple_exit): Define as inline to use find_loop_exit.
            (get_loop_desc): New prototype.
            (get_simple_loop_desc): Define as inline to use get_loop_desc.
            * loop-iv.c (check_simple_exit): Rename to check_loop_exit.
            (find_simple_exit): Rename to find_loop_exit.
            (get_simple_loop_desc): Rename to get_loop_desc.

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index afd56bb..df9d394 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1742,9 +1742,9 @@ get_loop_location (struct loop *loop)
      of the for or while statement, if possible.  To do this, look
      for the branch guarding the loop back-edge.  */
 
-  /* If this is a simple loop with an in_edge, then the loop control
-     branch is typically at the end of its source.  */
-  desc = get_simple_loop_desc (loop);
+  /* If this is a loop with an in_edge, then the loop control branch
+     is typically at the end of its source.  */
+  desc = get_loop_desc (loop, LOOP_EXIT_COMPLEX);
   if (desc->in_edge)
     {
       FOR_BB_INSNS_REVERSE (desc->in_edge->src, insn)
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 80a1915..583d247 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -460,6 +460,16 @@ struct GTY(()) niter_desc
   rtx niter_expr;
 };
 
+enum loop_exit_type
+{
+  /* Simple exit out of a loop.  */
+  LOOP_EXIT_SIMPLE,
+
+  /* Simple, but slightly more complex loop exit sometimes allowed for
+     further loop analysis.  See check_complex_exit_p.  */
+  LOOP_EXIT_COMPLEX
+};
+
 extern void iv_analysis_loop_init (struct loop *);
 extern bool iv_analyze (rtx_insn *, rtx, struct rtx_iv *);
 extern bool iv_analyze_result (rtx_insn *, rtx, struct rtx_iv *);
@@ -467,12 +477,30 @@ extern bool iv_analyze_expr (rtx_insn *, rtx, machine_mode,
 			     struct rtx_iv *);
 extern rtx get_iv_value (struct rtx_iv *, rtx);
 extern bool biv_p (rtx_insn *, rtx);
-extern void find_simple_exit (struct loop *, struct niter_desc *);
+extern void find_loop_exit (struct loop *, struct niter_desc *,
+			    enum loop_exit_type);
 extern void iv_analysis_done (void);
 
-extern struct niter_desc *get_simple_loop_desc (struct loop *loop);
+extern struct niter_desc *get_loop_desc (struct loop *loop,
+					 enum loop_exit_type);
 extern void free_simple_loop_desc (struct loop *loop);
 
+/* Simple version of find_loop_exit.  */
+
+static inline void
+find_simple_exit (struct loop *loop, struct niter_desc *desc)
+{
+  find_loop_exit (loop, desc, LOOP_EXIT_SIMPLE);
+}
+
+/* Simple version of get_loop_desc.  */
+
+static inline struct niter_desc *
+get_simple_loop_desc (struct loop *loop)
+{
+  return get_loop_desc (loop, LOOP_EXIT_SIMPLE);
+}
+
 static inline struct niter_desc *
 simple_loop_desc (struct loop *loop)
 {
diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index 891ed5c..22603d8 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -2961,11 +2961,14 @@ check_complex_exit_p (struct loop* loop, basic_block bb)
 
 }
 
-/* Checks whether E is a simple exit from LOOP and stores its description
-   into DESC.  */
+/* Checks whether E is an exit from LOOP and stores its description
+   into DESC.  If EXIT_TYPE is LOOP_EXIT_SIMPLE, only simple exits are
+   considered.  If EXIT_TYPE is LOOP_EXIT_COMPLEX, slightly more
+   complex loop exits are considered.  */
 
 static void
-check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
+check_loop_exit (struct loop *loop, edge e, struct niter_desc *desc,
+		 enum loop_exit_type exit_type)
 {
   basic_block exit_bb;
   rtx condition;
@@ -2981,7 +2984,8 @@ check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
 
   /* It must be tested (at least) once during any iteration.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, exit_bb)
-      && !check_complex_exit_p (loop, exit_bb))
+      && (exit_type == LOOP_EXIT_SIMPLE
+	  || !check_complex_exit_p (loop, exit_bb)))
     return;
 
   /* It must end in a simple conditional jump.  */
@@ -3011,10 +3015,14 @@ check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
   iv_number_of_iterations (loop, at, condition, desc);
 }
 
-/* Finds a simple exit of LOOP and stores its description into DESC.  */
+/* Finds an exit of LOOP and stores its description into DESC.  If
+   EXIT_TYPE is LOOP_EXIT_SIMPLE, only simple exits are considered.
+   If EXIT_TYPE is LOOP_EXIT_COMPLEX, slightly more complex loop exits
+   are considered.  */
 
 void
-find_simple_exit (struct loop *loop, struct niter_desc *desc)
+find_loop_exit (struct loop *loop, struct niter_desc *desc,
+		enum loop_exit_type exit_type)
 {
   unsigned i;
   basic_block *body;
@@ -3033,7 +3041,7 @@ find_simple_exit (struct loop *loop, struct niter_desc *desc)
 	  if (flow_bb_inside_loop_p (loop, e->dest))
 	    continue;
 
-	  check_simple_exit (loop, e, &act);
+	  check_loop_exit (loop, e, &act, exit_type);
 	  if (!act.simple_p)
 	    continue;
 
@@ -3101,11 +3109,13 @@ find_simple_exit (struct loop *loop, struct niter_desc *desc)
   free (body);
 }
 
-/* Creates a simple loop description of LOOP if it was not computed
-   already.  */
+/* Creates a loop description of LOOP if it was not computed already.
+   If EXIT_TYPE is LOOP_EXIT_SIMPLE, only simple exits are considered.
+   If EXIT_TYPE is LOOP_EXIT_COMPLEX, slightly more complex loop exits
+   are considered.  */
 
 struct niter_desc *
-get_simple_loop_desc (struct loop *loop)
+get_loop_desc (struct loop *loop, enum loop_exit_type exit_type)
 {
   struct niter_desc *desc = simple_loop_desc (loop);
 
@@ -3116,7 +3126,7 @@ get_simple_loop_desc (struct loop *loop)
      find_simple_loop_exit.  */
   desc = ggc_cleared_alloc<niter_desc> ();
   iv_analysis_loop_init (loop);
-  find_simple_exit (loop, desc);
+  find_loop_exit (loop, desc, exit_type);
   loop->simple_loop_desc = desc;
   return desc;
 }

Comments

Jeff Law Feb. 14, 2017, 12:15 a.m. UTC | #1
On 02/09/2017 11:08 AM, Aldy Hernandez wrote:
> For those of you not following the PR, this is a re-post of a patch that
> was approved in some form a year+ ago, but was reverted because it
> caused an undiagnosed bootstrap problem on AIX:
>
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00421.html
>
> Annoyingly, the AIX problem disappeared with some unrelated changes to
> SCCVN by Richard in more recent GCC's.  Consequently, the patch got
> blocked because we could no longer reproduce the problem, but it wasn't
> entirely clear that it was gone.
>
> As penance for sins in a previous life, I have taken it upon myself to
> reproduce this problem back in time and find what caused the AIX failure
> back then.  I will spare the list the boring process, but the problem
> with the original patch was that it changed the semantics of
> check_simple_exit, but not all of it's indirect callers.  This caused
> two versions of the same loop to have different unsynchronized iteration
> variables-- one in a simple register, and one in a doloop variant on ppc.
>
> I have bootstrapped this patch around trunk@226811 on AIX (all
> languages), which is the latest time AIX failed to bootstrap with the
> original patch.  My testing environment included a handful of other
> unrelated changes that were required to coerce AIX into building
> GCC5-ish with GCC6.1.
>
> I have also bootstrapped and tested this patch at today's trunk on
> x86-64 Linux with no adverse effects.
>
> As I've mentioned, I believe the original patch was previously approved,
> so to aid in reviewing I am including the full patch ("full-patch") with
> my changes on top of the original patch, as well as an incremental patch
> ("incremental-patch") with my recent changes to bring AIX to happiness.
>
> I understand if changes to the RTL looping infrastructure are too late
> for this release cycle, and as I am not the original author-- don't kill
> the messenger!  I just want to move this forward, even if it means
> getting approval for GCC8.  This will at the very least allow me to
> sleep at night, knowing I won't have to touch AIX for a while (or at
> least wrt to this PR) ;-).
>
> Original author(s) CC'ed.
So it seems in your updated patch there is only one call where we ask 
for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location.

But I don't see how asking for LOOP_EXIT_COMPLEX from that location 
would change whether or not we unroll any given loop (which is the core 
of bz64081).

Am I missing something?

Jeff
Aldy Hernandez Feb. 15, 2017, 10:49 a.m. UTC | #2
On 02/13/2017 07:15 PM, Jeff Law wrote:

> So it seems in your updated patch there is only one call where we ask
> for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location.
>
> But I don't see how asking for LOOP_EXIT_COMPLEX from that location
> would change whether or not we unroll any given loop (which is the core
> of bz64081).
>
> Am I missing something?

Ughh, only the spaghetti that is this code? ;-).

get_loop_location is only called once in the compiler, in 
decide_unrolling().  This call to get_loop_location() will set the loop 
description, particularly desc->simple_p where you point out.

Later on down in decide_unrolling(), we decide the number of iterations, 
and use desc->simple_p to ignore the loop if it is not simple.

       decide_unroll_constant_iterations (loop, flags);
       if (loop->lpt_decision.decision == LPT_NONE)
	decide_unroll_runtime_iterations (loop, flags);
       if (loop->lpt_decision.decision == LPT_NONE)
	decide_unroll_stupid (loop, flags);

Any one of these functions will bail if the loop description was not 
simple_p:

   /* Check for simple loops.  */
   desc = get_simple_loop_desc (loop);

   /* Check simpleness.  */
   if (!desc->simple_p || desc->assumptions)
     {
       if (dump_file)
	fprintf (dump_file,
		 ";; Unable to prove that the number of iterations "
		 "can be counted in runtime\n");
       return;
     }

(Yes, there's a lot of duplicated code in decide_unroll_*_iterations.)

Now a problem I see here is that decide_unroll_*_iterations all call 
get_simple_loop_desc() which is basically LOOP_EXIT_SIMPLE, but since 
the value is already cached we return the previous call that was 
LOOP_EXIT_COMPLEX.  So the code works because we will already have a 
cached value.

I think to make it clearer we could:

1. Add an assert in get_loop_desc to make sure that if we're returning a 
cached loop description, that the LOOP_EXIT_TYPEs match.  Just in case...

2. Change all the decide_unroll_*_iterations variants to specifically 
ask for a LOOP_EXIT_TYPE, not just  the simple variant. And have this 
set to LOOP_EXIT_COMPLEX from decide_unrolling.  Right now, this is all 
working because we have only one call to get_loop_location, but I assume 
that could change.

3. And finally, what the heck is get_loop_location doing in cfgloop, 
when it's only used once within loop-unroll.c?  I say we move it to 
loop-unroll.c and mark it static.

Does this help?

Aldy
Jeff Law Feb. 23, 2017, 7:20 a.m. UTC | #3
On 02/15/2017 03:49 AM, Aldy Hernandez wrote:
> On 02/13/2017 07:15 PM, Jeff Law wrote:
>
>> So it seems in your updated patch there is only one call where we ask
>> for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location.
>>
>> But I don't see how asking for LOOP_EXIT_COMPLEX from that location
>> would change whether or not we unroll any given loop (which is the core
>> of bz64081).
>>
>> Am I missing something?
>
> Ughh, only the spaghetti that is this code? ;-).
>
> get_loop_location is only called once in the compiler, in
> decide_unrolling().  This call to get_loop_location() will set the loop
> description, particularly desc->simple_p where you point out.
The "desc" persists, hanging off the loop structure and its the 
existence (or lack thereof) which enables/disables unrolling.

ie, we ask for complex exits via the get_loop_location calls.  So loops 
with complex exits will get a DESC and thus will get unrolled.  Other 
unpleasant things may happen as a result of having a cached DESC as well.

If we're going to go this direction, I think we need to store the exit 
type in the DESC, then verify that if we ask for simple exits, then only 
give back a DESC for loops with simple exits.  If we ask for complex 
exits, then we can give back any cached DESC.

Then every instance where we ask for a desc has to be checked for 
whether or not it is safe in the presence of complex exits and ask for 
the appropriate style.




> Later on down in decide_unrolling(), we decide the number of iterations,
> and use desc->simple_p to ignore the loop if it is not simple.
>
>       decide_unroll_constant_iterations (loop, flags);
>       if (loop->lpt_decision.decision == LPT_NONE)
>     decide_unroll_runtime_iterations (loop, flags);
>       if (loop->lpt_decision.decision == LPT_NONE)
>     decide_unroll_stupid (loop, flags);
>
> Any one of these functions will bail if the loop description was not
> simple_p:
I don't think so.  If that were the case, then we'd never unroll the 
loop in the testcase.  In fact, decide_unroll_constant_iterations 
unrolls the loop with the complex exit.



>
> Now a problem I see here is that decide_unroll_*_iterations all call
> get_simple_loop_desc() which is basically LOOP_EXIT_SIMPLE, but since
> the value is already cached we return the previous call that was
> LOOP_EXIT_COMPLEX.  So the code works because we will already have a
> cached value.
Right.  And ISTM it's the existence of the DESC structure that's the key 
here.  Right?

>
> I think to make it clearer we could:
>
> 1. Add an assert in get_loop_desc to make sure that if we're returning a
> cached loop description, that the LOOP_EXIT_TYPEs match.  Just in case...
Probably not a bad idea.  But you don't save the loop exit type AFAICT. 
In fact, if you did, you'd quickly see that the calls via decide_* would 
likely trigger your assert.



>
> 2. Change all the decide_unroll_*_iterations variants to specifically
> ask for a LOOP_EXIT_TYPE, not just  the simple variant. And have this
> set to LOOP_EXIT_COMPLEX from decide_unrolling.  Right now, this is all
> working because we have only one call to get_loop_location, but I assume
> that could change.
But the simple variants all ask for LOOP_EXIT_SIMPLE, so I don't see 
this as improving things significantly.

>
> 3. And finally, what the heck is get_loop_location doing in cfgloop,
> when it's only used once within loop-unroll.c?  I say we move it to
> loop-unroll.c and mark it static.
Seems like a reasonable cleanup.


>
> Does this help?
It does.  But it really shows that the introduction of simple vs complex 
exits is messed up.  All we've done is add another layer of complexity 
that happens to work IMHO, but it seems to do so more by accident than 
design.

Jeff

Patch
diff mbox

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index afd56bb..df9d394 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1742,9 +1742,9 @@  get_loop_location (struct loop *loop)
      of the for or while statement, if possible.  To do this, look
      for the branch guarding the loop back-edge.  */
 
-  /* If this is a simple loop with an in_edge, then the loop control
-     branch is typically at the end of its source.  */
-  desc = get_simple_loop_desc (loop);
+  /* If this is a loop with an in_edge, then the loop control branch
+     is typically at the end of its source.  */
+  desc = get_loop_desc (loop, LOOP_EXIT_COMPLEX);
   if (desc->in_edge)
     {
       FOR_BB_INSNS_REVERSE (desc->in_edge->src, insn)
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 80a1915..583d247 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -460,6 +460,16 @@  struct GTY(()) niter_desc
   rtx niter_expr;
 };
 
+enum loop_exit_type
+{
+  /* Simple exit out of a loop.  */
+  LOOP_EXIT_SIMPLE,
+
+  /* Simple, but slightly more complex loop exit sometimes allowed for
+     further loop analysis.  See check_complex_exit_p.  */
+  LOOP_EXIT_COMPLEX
+};
+
 extern void iv_analysis_loop_init (struct loop *);
 extern bool iv_analyze (rtx_insn *, rtx, struct rtx_iv *);
 extern bool iv_analyze_result (rtx_insn *, rtx, struct rtx_iv *);
@@ -467,12 +477,30 @@  extern bool iv_analyze_expr (rtx_insn *, rtx, machine_mode,
 			     struct rtx_iv *);
 extern rtx get_iv_value (struct rtx_iv *, rtx);
 extern bool biv_p (rtx_insn *, rtx);
-extern void find_simple_exit (struct loop *, struct niter_desc *);
+extern void find_loop_exit (struct loop *, struct niter_desc *,
+			    enum loop_exit_type);
 extern void iv_analysis_done (void);
 
-extern struct niter_desc *get_simple_loop_desc (struct loop *loop);
+extern struct niter_desc *get_loop_desc (struct loop *loop,
+					 enum loop_exit_type);
 extern void free_simple_loop_desc (struct loop *loop);
 
+/* Simple version of find_loop_exit.  */
+
+static inline void
+find_simple_exit (struct loop *loop, struct niter_desc *desc)
+{
+  find_loop_exit (loop, desc, LOOP_EXIT_SIMPLE);
+}
+
+/* Simple version of get_loop_desc.  */
+
+static inline struct niter_desc *
+get_simple_loop_desc (struct loop *loop)
+{
+  return get_loop_desc (loop, LOOP_EXIT_SIMPLE);
+}
+
 static inline struct niter_desc *
 simple_loop_desc (struct loop *loop)
 {
diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index 896fe0b1..22603d8 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -290,9 +290,27 @@  iv_analysis_loop_init (struct loop *loop)
   check_iv_ref_table_size ();
 }
 
+/* Checks that def is in an immediate predecessor of the latch block.  */
+
+static bool
+def_pred_latch_p (df_ref d_ref)
+{
+  basic_block bb = DF_REF_BB (d_ref);
+  edge_iterator ei;
+  edge e;
+
+  FOR_EACH_EDGE (e, ei, current_loop->latch->preds)
+    {
+      if (e->src == bb)
+	return true;
+    }
+  return false;
+}
+
 /* Finds the definition of REG that dominates loop latch and stores
    it to DEF.  Returns false if there is not a single definition
-   dominating the latch.  If REG has no definition in loop, DEF
+   dominating the latch or all defs are same and they are on different
+   predecessors of loop latch.  If REG has no definition in loop, DEF
    is set to NULL and true is returned.  */
 
 static bool
@@ -304,15 +322,28 @@  latch_dominating_def (rtx reg, df_ref *def)
 
   for (adef = DF_REG_DEF_CHAIN (regno); adef; adef = DF_REF_NEXT_REG (adef))
     {
+      /* Initialize this to true for the very first iteration when
+	 SINGLE_RD is NULL.  */
+      bool def_pred_latch = true;
+
       if (!bitmap_bit_p (df->blocks_to_analyze, DF_REF_BBNO (adef))
 	  || !bitmap_bit_p (&bb_info->out, DF_REF_ID (adef)))
 	continue;
 
-      /* More than one reaching definition.  */
-      if (single_rd)
+      /* More than one reaching definition is ok in case definitions are
+	 in predecessors of latch block and those definitions are the same.
+	 Probably this could be relaxed and check for sub-dominance instead
+	 predecessor.  */
+      if (single_rd
+	  && (!(def_pred_latch = def_pred_latch_p (adef))
+	      || !rtx_equal_p( PATTERN (DF_REF_INSN (single_rd)),
+			       PATTERN (DF_REF_INSN (adef)))))
 	return false;
 
-      if (!just_once_each_iteration_p (current_loop, DF_REF_BB (adef)))
+      /* It's ok if def is not executed on each iteration once we have defs on
+	 latch's predecessors.  */
+      if (!just_once_each_iteration_p (current_loop, DF_REF_BB (adef))
+	  && !def_pred_latch)
 	return false;
 
       single_rd = adef;
@@ -327,10 +358,10 @@  latch_dominating_def (rtx reg, df_ref *def)
 static enum iv_grd_result
 iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
 {
-  df_ref use, adef;
+  df_ref use, adef = NULL;
   basic_block def_bb, use_bb;
   rtx_insn *def_insn;
-  bool dom_p;
+  bool dom_p, dom_latch_p = false;
 
   *def = NULL;
   if (!simple_reg_p (reg))
@@ -345,11 +376,26 @@  iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
   if (!DF_REF_CHAIN (use))
     return GRD_INVARIANT;
 
-  /* More than one reaching def.  */
+  adef = DF_REF_CHAIN (use)->ref;
+  /* Having more than one reaching def is ok once all defs are in blocks which
+     are latch's predecessors.  */
   if (DF_REF_CHAIN (use)->next)
-    return GRD_INVALID;
+    {
+      df_link* defs;
+      unsigned int def_num = 0;
 
-  adef = DF_REF_CHAIN (use)->ref;
+      for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
+	{
+	  if (!def_pred_latch_p (defs->ref))
+	    return GRD_INVALID;
+	  def_num++;
+	}
+      /* Make sure all preds contain definitions.  */
+      if (def_num != EDGE_COUNT (current_loop->latch->preds))
+	return GRD_INVALID;
+
+      dom_latch_p = true;
+    }
 
   /* We do not handle setting only part of the register.  */
   if (DF_REF_FLAGS (adef) & DF_REF_READ_WRITE)
@@ -372,8 +418,8 @@  iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
 
   /* The definition does not dominate the use.  This is still OK if
      this may be a use of a biv, i.e. if the def_bb dominates loop
-     latch.  */
-  if (just_once_each_iteration_p (current_loop, def_bb))
+     latch or all defs are in latch's predecessors.  */
+  if (dom_latch_p || just_once_each_iteration_p (current_loop, def_bb))
     return GRD_MAYBE_BIV;
 
   return GRD_INVALID;
@@ -2872,11 +2918,57 @@  fail:
   return;
 }
 
-/* Checks whether E is a simple exit from LOOP and stores its description
-   into DESC.  */
+/* Check whether exit is not simple but still good for further analysis.
+   Looks like such loops mostly are a result of jump threading.  */
+
+static bool
+check_complex_exit_p (struct loop* loop, basic_block bb)
+{
+  edge e;
+  basic_block pred, exit;
+
+  if (EDGE_COUNT (bb->preds) > 1)
+    return false;
+
+  e = EDGE_PRED (bb, 0);
+
+  pred = e->src;
+  if (EDGE_COUNT (pred->succs) != 2)
+    return false;
+
+  /* Predecessor must be tested (at least) once during any iteration.  */
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, pred))
+    return false;
+
+  if (EDGE_SUCC (pred, 0)->dest == bb)
+    exit = EDGE_SUCC (pred, 1)->dest;
+  else
+    exit = EDGE_SUCC (pred, 0)->dest;
+
+  /* Check that EXIT is really loop exit.  */
+  if (flow_bb_inside_loop_p (loop, exit))
+    {
+      edge_iterator eei;
+      edge ee;
+
+      FOR_EACH_EDGE (ee, eei, exit->succs)
+	{
+	  if (!flow_bb_inside_loop_p (loop, ee->dest))
+	    return true;
+	}
+    }
+  return false;
+
+}
+
+/* Checks whether E is an exit from LOOP and stores its description
+   into DESC.  If EXIT_TYPE is LOOP_EXIT_SIMPLE, only simple exits are
+   considered.  If EXIT_TYPE is LOOP_EXIT_COMPLEX, slightly more
+   complex loop exits are considered.  */
 
 static void
-check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
+check_loop_exit (struct loop *loop, edge e, struct niter_desc *desc,
+		 enum loop_exit_type exit_type)
 {
   basic_block exit_bb;
   rtx condition;
@@ -2891,7 +2983,9 @@  check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
     return;
 
   /* It must be tested (at least) once during any iteration.  */
-  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, exit_bb))
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, exit_bb)
+      && (exit_type == LOOP_EXIT_SIMPLE
+	  || !check_complex_exit_p (loop, exit_bb)))
     return;
 
   /* It must end in a simple conditional jump.  */
@@ -2921,10 +3015,14 @@  check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
   iv_number_of_iterations (loop, at, condition, desc);
 }
 
-/* Finds a simple exit of LOOP and stores its description into DESC.  */
+/* Finds an exit of LOOP and stores its description into DESC.  If
+   EXIT_TYPE is LOOP_EXIT_SIMPLE, only simple exits are considered.
+   If EXIT_TYPE is LOOP_EXIT_COMPLEX, slightly more complex loop exits
+   are considered.  */
 
 void
-find_simple_exit (struct loop *loop, struct niter_desc *desc)
+find_loop_exit (struct loop *loop, struct niter_desc *desc,
+		enum loop_exit_type exit_type)
 {
   unsigned i;
   basic_block *body;
@@ -2943,7 +3041,7 @@  find_simple_exit (struct loop *loop, struct niter_desc *desc)
 	  if (flow_bb_inside_loop_p (loop, e->dest))
 	    continue;
 
-	  check_simple_exit (loop, e, &act);
+	  check_loop_exit (loop, e, &act, exit_type);
 	  if (!act.simple_p)
 	    continue;
 
@@ -3011,11 +3109,13 @@  find_simple_exit (struct loop *loop, struct niter_desc *desc)
   free (body);
 }
 
-/* Creates a simple loop description of LOOP if it was not computed
-   already.  */
+/* Creates a loop description of LOOP if it was not computed already.
+   If EXIT_TYPE is LOOP_EXIT_SIMPLE, only simple exits are considered.
+   If EXIT_TYPE is LOOP_EXIT_COMPLEX, slightly more complex loop exits
+   are considered.  */
 
 struct niter_desc *
-get_simple_loop_desc (struct loop *loop)
+get_loop_desc (struct loop *loop, enum loop_exit_type exit_type)
 {
   struct niter_desc *desc = simple_loop_desc (loop);
 
@@ -3026,7 +3126,7 @@  get_simple_loop_desc (struct loop *loop)
      find_simple_loop_exit.  */
   desc = ggc_cleared_alloc<niter_desc> ();
   iv_analysis_loop_init (loop);
-  find_simple_exit (loop, desc);
+  find_loop_exit (loop, desc, exit_type);
   loop->simple_loop_desc = desc;
   return desc;
 }
diff --git a/gcc/testsuite/gcc.dg/pr64081.c b/gcc/testsuite/gcc.dg/pr64081.c
new file mode 100644
index 0000000..7b20ed0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr64081.c
@@ -0,0 +1,40 @@ 
+/* PR rtl-optimization/64081 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -funroll-loops -fdump-rtl-loop2_unroll" } */
+
+long token;
+long *data;
+long *arr1;
+long *arr2;
+
+int main (int argc, const char* argv[])
+{
+  unsigned long i;
+  static long pos, start, count;
+  static int dir;
+
+  pos = 0;
+  dir = 1;
+
+  for (i = 0; i < argc; i++)
+    {
+      start = pos;
+      for (count = 0; count <= 400; count++)
+	{
+	  if (token == data[pos])
+	    break;
+	  if (dir == 1)
+	    pos = arr1[pos];
+	  else
+	    pos = arr2[pos];
+	  if (pos == -1)
+	    {
+	      pos = start;
+	      dir = !dir;
+	    }
+	}
+    }
+  return pos + dir;
+}
+
+/* { dg-final { scan-rtl-dump "loop unrolled 7 times" "loop2_unroll" } } */