diff mbox series

[parloops,PR83126] , Use cached affine_ivs canonicalize_loop_ivs

Message ID 72456add-0339-35cf-c9d6-6aade4af6050@mentor.com
State New
Headers show
Series [parloops,PR83126] , Use cached affine_ivs canonicalize_loop_ivs | expand

Commit Message

Tom de Vries Feb. 22, 2018, 12:08 p.m. UTC
Hi,

this patch fixes an ICE in the parloops pass.

The ICE (when compiling the test-case in attached patch) follows from 
the fact that here in gen_parallel_loop the call to 
canonicalize_loop_ivs fails to "base all the induction variables in LOOP 
on a single control one":
...
   /* Base all the induction variables in LOOP on a single control one.*/
   canonicalize_loop_ivs (loop, &nit, true);
...

This is caused by the fact that for one of the induction variables, 
simple_iv no longer returns true (as was the case at the start of 
gen_parallel_loop). This seems to be triggered by the fact that during 
loop_version we call scev_reset (although I'm not sure why when 
recalculating scev info we're not reaching the same conclusion as before).

The patch fixes the ICE by caching the affine_ivs as calculated by 
simple_iv before calling loop_version, and reusing those in 
canonicalize_loop_ivs (instead of calling simple_iv again).

Bootstrapped and reg-tested on x86_64.

OK for stage1?

I'm not sure if this is minimal/low-impact enough for stage4. If not, I 
can rework the bit of the patch that adds an assert after 
canonicalize_loop_ivs into a patch that aborts the optimization instead 
of ICE-ing.

Thanks,
- Tom

Comments

Richard Biener March 12, 2018, 12:14 p.m. UTC | #1
On Thu, 22 Feb 2018, Tom de Vries wrote:

> Hi,
> 
> this patch fixes an ICE in the parloops pass.
> 
> The ICE (when compiling the test-case in attached patch) follows from the fact
> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
> "base all the induction variables in LOOP on a single control one":
> ...
>   /* Base all the induction variables in LOOP on a single control one.*/
>   canonicalize_loop_ivs (loop, &nit, true);
> ...
> 
> This is caused by the fact that for one of the induction variables, simple_iv
> no longer returns true (as was the case at the start of gen_parallel_loop).
> This seems to be triggered by the fact that during loop_version we call
> scev_reset (although I'm not sure why when recalculating scev info we're not
> reaching the same conclusion as before).

I guess the real reason is that canonicalize_loop_ivs calls
create_iv first which in the parloop case with bump-in-latch true
and then incrementally re-writes PHIs, eventually wrecking other
PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
first one but the rewritten ones depend on the new IV already.

So the better fix (maybe not 100% enough) would be to re-organize
canonicalize_loop_ivs to first do analysis of all PHIs and then
rewrite them all.

> 
> The patch fixes the ICE by caching the affine_ivs as calculated by simple_iv
> before calling loop_version, and reusing those in canonicalize_loop_ivs
> (instead of calling simple_iv again).
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage1?

First of all please to not use std::map but instead our own hash_map,
the proper way of including <map> would be to define INCLUDE_MAP
before the system.h include.

But see above.

> I'm not sure if this is minimal/low-impact enough for stage4. If not, I can
> rework the bit of the patch that adds an assert after canonicalize_loop_ivs
> into a patch that aborts the optimization instead of ICE-ing.

I think that's something reasonable anyway.

Thanks,
Richard.

> Thanks,
> - Tom
>
Tom de Vries March 21, 2018, 10:37 a.m. UTC | #2
On 03/12/2018 01:14 PM, Richard Biener wrote:
> On Thu, 22 Feb 2018, Tom de Vries wrote:
>> I can
>> rework the bit of the patch that adds an assert after canonicalize_loop_ivs
>> into a patch that aborts the optimization instead of ICE-ing.
> 
> I think that's something reasonable anyway.
> 

Patch attached below implements that approach.

The difference between before parloops, and after ompexpssa2 is this 
(showing the effects of canonicalize_loop_ivs):
...
$ diff -u pr83126.c.156t.dce5 pr83126.c.158t.ompexpssa2
    ...
  ew (short unsigned int c9, int stuff)
  {
    int * fd;
@@ -9,53 +103,53 @@
    int _3;
    short unsigned int _4;
    unsigned int _5;
+  unsigned int ivtmp_8;
    unsigned int _22;
    unsigned int ivtmp_24;
    unsigned int ivtmp_26;
+  unsigned int ivtmp_28;

    <bb 2> [local count: 11811]:

    <bb 3> [local count: 118111601]:
-  # c9_1 = PHI <c9_11(D)(2), c9_21(11)>
+  # c9_1 = PHI <c9_11(D)(2), c9_21(7)>
    _2 = (long int) c9_1;
    fd_13 = (int *) _2;
    _3 = *fd_13;

    <bb 4> [local count: 1073741825]:
    if (_3 != 0)
-    goto <bb 6>; [11.00%]
+    goto <bb 15>; [11.00%]
    else
      goto <bb 9>; [89.00%]

    <bb 9> [local count: 955630225]:
    goto <bb 4>; [100.00%]

-  <bb 6> [local count: 118111600]:
+  <bb 15> [local count: 94489281]:

-  <bb 10> [local count: 118111600]:
-
-  <bb 5> [local count: 236258638]:
-  # _22 = PHI <0(10), _5(8)>
-  # c9_23 = PHI <c9_1(10), c9_14(8)>
-  # e1_27 = PHI <0(10), e1_17(8)>
-  # ivtmp_26 = PHI <2(10), ivtmp_24(8)>
+  <bb 5> [local count: 189006911]:
+  # c9_23 = PHI <c9_14(8), c9_1(15)>
+  # e1_27 = PHI <e1_17(8), 0(15)>
+  # ivtmp_8 = PHI <ivtmp_28(8), 0(15)>
+  _22 = ivtmp_8;
+  ivtmp_26 = 2 - ivtmp_8;
    _4 = (short unsigned int) e1_27;
    c9_14 = _4 * c9_23;
    _5 = _22 + 1;
    e1_17 = (int) _5;
    ivtmp_24 = ivtmp_26 - 1;
-  if (ivtmp_24 != 0)
+  if (ivtmp_8 < 1)
      goto <bb 8>; [66.67%]
    else
      goto <bb 7>; [33.33%]

-  <bb 8> [local count: 157513634]:
+  <bb 8> [local count: 126010908]:
+  ivtmp_28 = ivtmp_8 + 1;
    goto <bb 5>; [100.00%]

    <bb 7> [local count: 78745004]:
    # c9_21 = PHI <c9_14(5)>
-
-  <bb 11> [local count: 78745004]:
    goto <bb 3>; [100.00%]

  }
...


Bootstrapped and reg-tested on x86_64.

OK for stage4 trunk?

Thanks,
- Tom
[parloops] Handle canonicalize_loop_ivs failure

2018-03-19  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/83126
	* tree-parloops.c (num_phis): New function.
	(gen_parallel_loop): Detect and handle canonicalize_loop_ivs failure.

	* gcc.dg/graphite/pr83126.c: New test.

---
 gcc/testsuite/gcc.dg/graphite/pr83126.c | 19 ++++++++++++++++
 gcc/tree-parloops.c                     | 39 +++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/graphite/pr83126.c b/gcc/testsuite/gcc.dg/graphite/pr83126.c
new file mode 100644
index 0000000..663d059
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr83126.c
@@ -0,0 +1,19 @@
+/* { dg-additional-options "-w -ftree-parallelize-loops=2 -floop-parallelize-all -O1" }  */
+
+void
+ew (unsigned short int c9, int stuff)
+{
+  int e1;
+
+  for (;;)
+    {
+      unsigned int *by = &e1;
+      int *fd = &stuff;
+
+      *fd = c9;
+      fd = *fd;
+      if (*fd != 0)
+	for (*by = 0; *by < 2; ++*by)
+	  c9 *= e1;
+    }
+}
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e44ad5e..3a788cc 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2235,6 +2235,25 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   calculate_dominance_info (CDI_DOMINATORS);
 }
 
+/* Return number of phis in bb.  If COUNT_VIRTUAL_P is false, don't count the
+   virtual phi.  */
+
+static unsigned int
+num_phis (basic_block bb, bool count_virtual_p)
+{
+  unsigned int nr_phis = 0;
+  gphi_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      if (!count_virtual_p && virtual_operand_p (PHI_RESULT (gsi.phi ())))
+	continue;
+
+      nr_phis++;
+    }
+
+  return nr_phis;
+}
+
 /* Generates code to execute the iterations of LOOP in N_THREADS
    threads in parallel, which can be 0 if that number is to be determined
    later.
@@ -2370,6 +2389,26 @@ gen_parallel_loop (struct loop *loop,
 
   /* Base all the induction variables in LOOP on a single control one.  */
   canonicalize_loop_ivs (loop, &nit, true);
+  if (num_phis (loop->header, false) != reduction_list->elements () + 1)
+    {
+      /* The call to canonicalize_loop_ivs above failed to "base all the
+	 induction variables in LOOP on a single control one".  Do damage
+	 control.  */
+      basic_block preheader = loop_preheader_edge (loop)->src;
+      basic_block cond_bb = single_pred (preheader);
+      gcond *cond = as_a <gcond *> (gsi_stmt (gsi_last_bb (cond_bb)));
+      gimple_cond_make_true (cond);
+      update_stmt (cond);
+      /* We've gotten rid of the duplicate loop created by loop_version, but
+	 we can't undo whatever canonicalize_loop_ivs has done.
+	 TODO: Fix this properly by ensuring that the call to
+	 canonicalize_loop_ivs succeeds.  */
+      if (dump_file
+	  && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "canonicalize_loop_ivs failed for loop %d,"
+		 " aborting transformation\n", loop->num);
+      return;
+    }
 
   /* Ensure that the exit condition is the first statement in the loop.
      The common case is that latch of the loop is empty (apart from the
Richard Biener March 21, 2018, 10:44 a.m. UTC | #3
On Wed, 21 Mar 2018, Tom de Vries wrote:

> On 03/12/2018 01:14 PM, Richard Biener wrote:
> > On Thu, 22 Feb 2018, Tom de Vries wrote:
> > > I can
> > > rework the bit of the patch that adds an assert after
> > > canonicalize_loop_ivs
> > > into a patch that aborts the optimization instead of ICE-ing.
> > 
> > I think that's something reasonable anyway.
> > 
> 
> Patch attached below implements that approach.
> 
> The difference between before parloops, and after ompexpssa2 is this (showing
> the effects of canonicalize_loop_ivs):
> ...
> $ diff -u pr83126.c.156t.dce5 pr83126.c.158t.ompexpssa2
>    ...
>  ew (short unsigned int c9, int stuff)
>  {
>    int * fd;
> @@ -9,53 +103,53 @@
>    int _3;
>    short unsigned int _4;
>    unsigned int _5;
> +  unsigned int ivtmp_8;
>    unsigned int _22;
>    unsigned int ivtmp_24;
>    unsigned int ivtmp_26;
> +  unsigned int ivtmp_28;
> 
>    <bb 2> [local count: 11811]:
> 
>    <bb 3> [local count: 118111601]:
> -  # c9_1 = PHI <c9_11(D)(2), c9_21(11)>
> +  # c9_1 = PHI <c9_11(D)(2), c9_21(7)>
>    _2 = (long int) c9_1;
>    fd_13 = (int *) _2;
>    _3 = *fd_13;
> 
>    <bb 4> [local count: 1073741825]:
>    if (_3 != 0)
> -    goto <bb 6>; [11.00%]
> +    goto <bb 15>; [11.00%]
>    else
>      goto <bb 9>; [89.00%]
> 
>    <bb 9> [local count: 955630225]:
>    goto <bb 4>; [100.00%]
> 
> -  <bb 6> [local count: 118111600]:
> +  <bb 15> [local count: 94489281]:
> 
> -  <bb 10> [local count: 118111600]:
> -
> -  <bb 5> [local count: 236258638]:
> -  # _22 = PHI <0(10), _5(8)>
> -  # c9_23 = PHI <c9_1(10), c9_14(8)>
> -  # e1_27 = PHI <0(10), e1_17(8)>
> -  # ivtmp_26 = PHI <2(10), ivtmp_24(8)>
> +  <bb 5> [local count: 189006911]:
> +  # c9_23 = PHI <c9_14(8), c9_1(15)>
> +  # e1_27 = PHI <e1_17(8), 0(15)>
> +  # ivtmp_8 = PHI <ivtmp_28(8), 0(15)>
> +  _22 = ivtmp_8;
> +  ivtmp_26 = 2 - ivtmp_8;
>    _4 = (short unsigned int) e1_27;
>    c9_14 = _4 * c9_23;
>    _5 = _22 + 1;
>    e1_17 = (int) _5;
>    ivtmp_24 = ivtmp_26 - 1;
> -  if (ivtmp_24 != 0)
> +  if (ivtmp_8 < 1)
>      goto <bb 8>; [66.67%]
>    else
>      goto <bb 7>; [33.33%]
> 
> -  <bb 8> [local count: 157513634]:
> +  <bb 8> [local count: 126010908]:
> +  ivtmp_28 = ivtmp_8 + 1;
>    goto <bb 5>; [100.00%]
> 
>    <bb 7> [local count: 78745004]:
>    # c9_21 = PHI <c9_14(5)>
> -
> -  <bb 11> [local count: 78745004]:
>    goto <bb 3>; [100.00%]
> 
>  }
> ...
> 
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage4 trunk?

OK.

Thanks,
Richard.
Tom de Vries March 21, 2018, 3:06 p.m. UTC | #4
On 03/12/2018 01:14 PM, Richard Biener wrote:
> On Thu, 22 Feb 2018, Tom de Vries wrote:
> 
>> Hi,
>>
>> this patch fixes an ICE in the parloops pass.
>>
>> The ICE (when compiling the test-case in attached patch) follows from the fact
>> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
>> "base all the induction variables in LOOP on a single control one":
>> ...
>>    /* Base all the induction variables in LOOP on a single control one.*/
>>    canonicalize_loop_ivs (loop, &nit, true);
>> ...
>>
>> This is caused by the fact that for one of the induction variables, simple_iv
>> no longer returns true (as was the case at the start of gen_parallel_loop).
>> This seems to be triggered by the fact that during loop_version we call
>> scev_reset (although I'm not sure why when recalculating scev info we're not
>> reaching the same conclusion as before).
> I guess the real reason is that canonicalize_loop_ivs calls
> create_iv first which in the parloop case with bump-in-latch true
> and then incrementally re-writes PHIs, eventually wrecking other
> PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
> first one but the rewritten ones depend on the new IV already.
> 
> So the better fix (maybe not 100% enough) would be to re-organize
> canonicalize_loop_ivs to first do analysis of all PHIs and then
> rewrite them all.
> 

Focusing on the re-organize first, is this sort of what you had in mind?

Bootstrapped and reg-tested on x86_64.

Thanks,
- Tom
Split canonicalize_loop_ivs in analyze and rewrite parts

2018-03-21  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-loop-manip.c (rewrite_phi_with_iv): Remove loop parameter.
	Add iv parameter.  Move early-out tests ...
	(rewrite_all_phi_nodes_with_iv): ... here.  Remove loop argument and add
	iv argument to rewrite_phi_with_iv call.
	(get_all_phi_affine_ivs): New function.
	(canonicalize_loop_ivs): Call get_all_phi_affine_ivs and pass result as
	argument to rewrite_all_phi_nodes_with_iv.

---
 gcc/tree-ssa-loop-manip.c | 82 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 61 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index bf425af..15a5795 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-inline.h"
+#include "hash-map.h"
 
 /* All bitmaps for rewriting into loop-closed SSA go on this obstack,
    so that we can free them all at once.  */
@@ -1430,48 +1431,65 @@ tree_unroll_loop (struct loop *loop, unsigned factor,
    induction variable MAIN_IV and insert the generated code at GSI.  */
 
 static void
-rewrite_phi_with_iv (loop_p loop,
-		     gphi_iterator *psi,
+rewrite_phi_with_iv (gphi_iterator *psi,
 		     gimple_stmt_iterator *gsi,
-		     tree main_iv)
+		     tree main_iv, affine_iv *iv)
 {
-  affine_iv iv;
   gassign *stmt;
   gphi *phi = psi->phi ();
   tree atype, mtype, val, res = PHI_RESULT (phi);
 
-  if (virtual_operand_p (res) || res == main_iv)
-    {
-      gsi_next (psi);
-      return;
-    }
-
-  if (!simple_iv (loop, loop, res, &iv, true))
-    {
-      gsi_next (psi);
-      return;
-    }
-
   remove_phi_node (psi, false);
 
   atype = TREE_TYPE (res);
   mtype = POINTER_TYPE_P (atype) ? sizetype : atype;
-  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step),
+  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv->step),
 		     fold_convert (mtype, main_iv));
   val = fold_build2 (POINTER_TYPE_P (atype)
 		     ? POINTER_PLUS_EXPR : PLUS_EXPR,
-		     atype, unshare_expr (iv.base), val);
+		     atype, unshare_expr (iv->base), val);
   val = force_gimple_operand_gsi (gsi, val, false, NULL_TREE, true,
 				  GSI_SAME_STMT);
   stmt = gimple_build_assign (res, val);
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 }
 
+/* Return map of phi node result to affine_iv, for all phis in LOOPS.  */
+
+static void
+get_all_phi_affine_ivs (loop_p loop, hash_map<tree,affine_iv> *simple_ivs)
+{
+  unsigned i;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+  gphi_iterator psi;
+
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      if (bb->loop_father != loop)
+	continue;
+
+      for (psi = gsi_start_phis (bb); !gsi_end_p (psi);
+	   gsi_next (&psi))
+	{
+	  gphi *phi = psi.phi ();
+	  affine_iv iv;
+	  tree res = PHI_RESULT (phi);
+	  if (simple_iv (loop, loop, res, &iv, true))
+	    simple_ivs->put (res, iv);
+	}
+    }
+
+  free (bbs);
+}
+
 /* Rewrite all the phi nodes of LOOP in function of the main induction
    variable MAIN_IV.  */
 
 static void
-rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
+rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv,
+			       hash_map<tree,affine_iv> *simple_ivs)
 {
   unsigned i;
   basic_block *bbs = get_loop_body_in_dom_order (loop);
@@ -1486,7 +1504,25 @@ rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
 	continue;
 
       for (psi = gsi_start_phis (bb); !gsi_end_p (psi); )
-	rewrite_phi_with_iv (loop, &psi, &gsi, main_iv);
+	{
+	  affine_iv *iv;
+	  gphi *phi = psi.phi ();
+	  tree res = PHI_RESULT (phi);
+	  if (virtual_operand_p (res) || res == main_iv)
+	    {
+	      gsi_next (&psi);
+	      continue;
+	    }
+
+	  iv = simple_ivs->get (PHI_RESULT (phi));
+	  if (!iv)
+	    {
+	      gsi_next (&psi);
+	      continue;
+	    }
+
+	  rewrite_phi_with_iv (&psi, &gsi, main_iv, iv);
+	}
     }
 
   free (bbs);
@@ -1554,10 +1590,14 @@ canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
     gsi = gsi_last_bb (loop->latch);
   else
     gsi = gsi_last_nondebug_bb (loop->header);
+
+  hash_map<tree,affine_iv> simple_ivs;
+  get_all_phi_affine_ivs (loop, &simple_ivs);
+
   create_iv (build_int_cst_type (type, 0), build_int_cst (type, 1), NULL_TREE,
 	     loop, &gsi, bump_in_latch, &var_before, NULL);
 
-  rewrite_all_phi_nodes_with_iv (loop, var_before);
+  rewrite_all_phi_nodes_with_iv (loop, var_before, &simple_ivs);
 
   stmt = as_a <gcond *> (last_stmt (exit->src));
   /* Make the loop exit if the control condition is not satisfied.  */
Bin.Cheng March 21, 2018, 3:31 p.m. UTC | #5
On Wed, Mar 21, 2018 at 3:06 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 03/12/2018 01:14 PM, Richard Biener wrote:
>>
>> On Thu, 22 Feb 2018, Tom de Vries wrote:
>>
>>> Hi,
>>>
>>> this patch fixes an ICE in the parloops pass.
>>>
>>> The ICE (when compiling the test-case in attached patch) follows from the
>>> fact
>>> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
>>> "base all the induction variables in LOOP on a single control one":
>>> ...
>>>    /* Base all the induction variables in LOOP on a single control one.*/
>>>    canonicalize_loop_ivs (loop, &nit, true);
>>> ...
>>>
>>> This is caused by the fact that for one of the induction variables,
>>> simple_iv
>>> no longer returns true (as was the case at the start of
>>> gen_parallel_loop).
>>> This seems to be triggered by the fact that during loop_version we call
>>> scev_reset (although I'm not sure why when recalculating scev info we're
>>> not
>>> reaching the same conclusion as before).
>>
>> I guess the real reason is that canonicalize_loop_ivs calls
>> create_iv first which in the parloop case with bump-in-latch true
>> and then incrementally re-writes PHIs, eventually wrecking other
>> PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
>> first one but the rewritten ones depend on the new IV already.
>>
>> So the better fix (maybe not 100% enough) would be to re-organize
>> canonicalize_loop_ivs to first do analysis of all PHIs and then
>> rewrite them all.
>>
>
> Focusing on the re-organize first, is this sort of what you had in mind?

FYI, we have multiple interfaces manipulating on IVs, sometimes in a similar
way.  An example would be create_canonical_iv in tree-ssa-loop-ivcanon.c.

One proposal for discussion is to refactor such interfaces and put them in a
single source file, like tree-ssa-loop-ivopts.  For example of
create_canonical_iv,
it can be removed by simply adding the corresponding candidate in IVOPTs.

Thanks,
bin
>
> Bootstrapped and reg-tested on x86_64.
>
> Thanks,
> - Tom
Richard Biener March 21, 2018, 3:43 p.m. UTC | #6
On Wed, 21 Mar 2018, Tom de Vries wrote:

> On 03/12/2018 01:14 PM, Richard Biener wrote:
> > On Thu, 22 Feb 2018, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > this patch fixes an ICE in the parloops pass.
> > > 
> > > The ICE (when compiling the test-case in attached patch) follows from the
> > > fact
> > > that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
> > > "base all the induction variables in LOOP on a single control one":
> > > ...
> > >    /* Base all the induction variables in LOOP on a single control one.*/
> > >    canonicalize_loop_ivs (loop, &nit, true);
> > > ...
> > > 
> > > This is caused by the fact that for one of the induction variables,
> > > simple_iv
> > > no longer returns true (as was the case at the start of
> > > gen_parallel_loop).
> > > This seems to be triggered by the fact that during loop_version we call
> > > scev_reset (although I'm not sure why when recalculating scev info we're
> > > not
> > > reaching the same conclusion as before).
> > I guess the real reason is that canonicalize_loop_ivs calls
> > create_iv first which in the parloop case with bump-in-latch true
> > and then incrementally re-writes PHIs, eventually wrecking other
> > PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
> > first one but the rewritten ones depend on the new IV already.
> > 
> > So the better fix (maybe not 100% enough) would be to re-organize
> > canonicalize_loop_ivs to first do analysis of all PHIs and then
> > rewrite them all.
> > 
> 
> Focusing on the re-organize first, is this sort of what you had in mind?

Yes, though not sure why you need to have a hash-map here, just pushing
to a vector of PHIs would work, no?  Or alternatively a bitmap
of SSA versions so you have a way to match the gphi iterator with
the set of IVs.  But the vector should be sorted in PHI order so
just traversing the PHIs looking for the "next" PHI in the vector
would work I guess.

Does it help with the original issue btw?

Richard.

> Bootstrapped and reg-tested on x86_64.
> 
> Thanks,
> - Tom
>
Tom de Vries March 22, 2018, 3:32 p.m. UTC | #7
On 03/21/2018 04:43 PM, Richard Biener wrote:
> On Wed, 21 Mar 2018, Tom de Vries wrote:
> 
>> On 03/12/2018 01:14 PM, Richard Biener wrote:
>>> On Thu, 22 Feb 2018, Tom de Vries wrote:
>>>
>>>> Hi,
>>>>
>>>> this patch fixes an ICE in the parloops pass.
>>>>
>>>> The ICE (when compiling the test-case in attached patch) follows from the
>>>> fact
>>>> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
>>>> "base all the induction variables in LOOP on a single control one":
>>>> ...
>>>>     /* Base all the induction variables in LOOP on a single control one.*/
>>>>     canonicalize_loop_ivs (loop, &nit, true);
>>>> ...
>>>>
>>>> This is caused by the fact that for one of the induction variables,
>>>> simple_iv
>>>> no longer returns true (as was the case at the start of
>>>> gen_parallel_loop).
>>>> This seems to be triggered by the fact that during loop_version we call
>>>> scev_reset (although I'm not sure why when recalculating scev info we're
>>>> not
>>>> reaching the same conclusion as before).
>>> I guess the real reason is that canonicalize_loop_ivs calls
>>> create_iv first which in the parloop case with bump-in-latch true
>>> and then incrementally re-writes PHIs, eventually wrecking other
>>> PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
>>> first one but the rewritten ones depend on the new IV already.
>>>
>>> So the better fix (maybe not 100% enough) would be to re-organize
>>> canonicalize_loop_ivs to first do analysis of all PHIs and then
>>> rewrite them all.
>>>
>>
>> Focusing on the re-organize first, is this sort of what you had in mind?
> 
> Yes, though not sure why you need to have a hash-map here, just pushing
> to a vector of PHIs would work, no?  Or alternatively a bitmap
> of SSA versions so you have a way to match the gphi iterator with
> the set of IVs.  But the vector should be sorted in PHI order so
> just traversing the PHIs looking for the "next" PHI in the vector
> would work I guess.

The reason I chose a hash map was robustness: to allow the maximum 
amount of change between analysis and transformation.

I've now used a vector of this struct:
...
struct phi_affine_iv
{
   gphi *phi;
   affine_iv iv;
};
...
and rewritten the transformation part to iterate over the vector.

Bootstrapped and reg-tested on x86_64.

Thanks,
- Tom
Split canonicalize_loop_ivs in analyze and rewrite part

2018-03-21  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-loop-manip.c (rewrite_phi_with_iv): Remove loop parameter.
	Add iv parameter.  Remove early-out tests.
	(struct phi_affine_iv): New struct.
	(get_all_phi_affine_ivs): New function.  Factor out of ...
	(rewrite_all_phi_nodes_with_iv): ... this.  Remove loop argument.  Add
	simple_ivs argument.  Rewrite to iterate over simple_ivs.
	(canonicalize_loop_ivs_1): Factor out of ...
	(canonicalize_loop_ivs): ... this. Call get_all_phi_affine_ivs and pass
	result as argument to canonicalize_loop_ivs_1.

---
 gcc/tree-ssa-loop-manip.c | 114 ++++++++++++++++++++++++++++++----------------
 1 file changed, 76 insertions(+), 38 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index bf425af..ecda325 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -1430,79 +1430,99 @@ tree_unroll_loop (struct loop *loop, unsigned factor,
    induction variable MAIN_IV and insert the generated code at GSI.  */
 
 static void
-rewrite_phi_with_iv (loop_p loop,
-		     gphi_iterator *psi,
+rewrite_phi_with_iv (gphi_iterator *psi,
 		     gimple_stmt_iterator *gsi,
-		     tree main_iv)
+		     tree main_iv, affine_iv *iv)
 {
-  affine_iv iv;
   gassign *stmt;
   gphi *phi = psi->phi ();
   tree atype, mtype, val, res = PHI_RESULT (phi);
 
-  if (virtual_operand_p (res) || res == main_iv)
-    {
-      gsi_next (psi);
-      return;
-    }
-
-  if (!simple_iv (loop, loop, res, &iv, true))
-    {
-      gsi_next (psi);
-      return;
-    }
-
   remove_phi_node (psi, false);
 
   atype = TREE_TYPE (res);
   mtype = POINTER_TYPE_P (atype) ? sizetype : atype;
-  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step),
+  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv->step),
 		     fold_convert (mtype, main_iv));
   val = fold_build2 (POINTER_TYPE_P (atype)
 		     ? POINTER_PLUS_EXPR : PLUS_EXPR,
-		     atype, unshare_expr (iv.base), val);
+		     atype, unshare_expr (iv->base), val);
   val = force_gimple_operand_gsi (gsi, val, false, NULL_TREE, true,
 				  GSI_SAME_STMT);
   stmt = gimple_build_assign (res, val);
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 }
 
-/* Rewrite all the phi nodes of LOOP in function of the main induction
-   variable MAIN_IV.  */
+struct phi_affine_iv
+{
+  gphi *phi;
+  affine_iv iv;
+};
+
+/* Return map of phi node result to affine_iv, for all phis in LOOPS.  */
 
 static void
-rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
+get_all_phi_affine_ivs (struct loop *loop,
+			auto_vec<struct phi_affine_iv> *simple_ivs)
 {
-  unsigned i;
   basic_block *bbs = get_loop_body_in_dom_order (loop);
-  gphi_iterator psi;
 
-  for (i = 0; i < loop->num_nodes; i++)
+  for (unsigned i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = bbs[i];
-      gimple_stmt_iterator gsi = gsi_after_labels (bb);
-
       if (bb->loop_father != loop)
 	continue;
 
-      for (psi = gsi_start_phis (bb); !gsi_end_p (psi); )
-	rewrite_phi_with_iv (loop, &psi, &gsi, main_iv);
+      for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
+	   gsi_next (&psi))
+	{
+	  gphi *phi = psi.phi ();
+	  tree res = PHI_RESULT (phi);
+	  if (virtual_operand_p (res))
+	    continue;
+
+	  affine_iv iv;
+	  if (!simple_iv (loop, loop, res, &iv, true))
+	    continue;
+
+	  struct phi_affine_iv elem;
+	  elem.phi = phi;
+	  elem.iv = iv;
+	  simple_ivs->safe_push (elem);
+	}
     }
 
   free (bbs);
 }
 
-/* Bases all the induction variables in LOOP on a single induction variable
-   (with base 0 and step 1), whose final value is compared with *NIT.  When the
-   IV type precision has to be larger than *NIT type precision, *NIT is
-   converted to the larger type, the conversion code is inserted before the
-   loop, and *NIT is updated to the new definition.  When BUMP_IN_LATCH is true,
-   the induction variable is incremented in the loop latch, otherwise it is
-   incremented in the loop header.  Return the induction variable that was
-   created.  */
+/* Rewrite all the phi nodes in SIMPLE_IVS in function of the main induction
+   variable MAIN_IV.  */
+
+static void
+rewrite_all_phi_nodes_with_iv (tree main_iv,
+			       auto_vec<struct phi_affine_iv> *simple_ivs)
+{
+  gimple_stmt_iterator gsi;
+  unsigned i;
+  struct phi_affine_iv *elem;
+
+  FOR_EACH_VEC_ELT (*simple_ivs, i, elem)
+    {
+      gphi *phi = elem->phi;
+      basic_block phi_bb = gimple_bb (phi);
+      if (i == 0 || gsi_bb (gsi) != phi_bb)
+	gsi = gsi_after_labels (phi_bb);
+      gphi_iterator psi = gsi_for_phi (phi);
+      rewrite_phi_with_iv (&psi, &gsi, main_iv, &elem->iv);
+    }
+}
+
+/* As canonicalize_loop_ivs, but with SIMPLE_IVS containing the phis that need
+   rewriting.  */
 
 tree
-canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
+canonicalize_loop_ivs_1 (struct loop *loop, tree *nit, bool bump_in_latch,
+			 auto_vec<struct phi_affine_iv> *simple_ivs)
 {
   unsigned precision = TYPE_PRECISION (TREE_TYPE (*nit));
   unsigned original_precision = precision;
@@ -1557,7 +1577,7 @@ canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
   create_iv (build_int_cst_type (type, 0), build_int_cst (type, 1), NULL_TREE,
 	     loop, &gsi, bump_in_latch, &var_before, NULL);
 
-  rewrite_all_phi_nodes_with_iv (loop, var_before);
+  rewrite_all_phi_nodes_with_iv (var_before, simple_ivs);
 
   stmt = as_a <gcond *> (last_stmt (exit->src));
   /* Make the loop exit if the control condition is not satisfied.  */
@@ -1576,3 +1596,21 @@ canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
 
   return var_before;
 }
+
+/* Bases all the induction variables in LOOP on a single induction variable
+   (with base 0 and step 1), whose final value is compared with *NIT.  When the
+   IV type precision has to be larger than *NIT type precision, *NIT is
+   converted to the larger type, the conversion code is inserted before the
+   loop, and *NIT is updated to the new definition.  When BUMP_IN_LATCH is true,
+   the induction variable is incremented in the loop latch, otherwise it is
+   incremented in the loop header.  Return the induction variable that was
+   created.  */
+
+tree
+canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
+{
+  auto_vec<struct phi_affine_iv> simple_ivs;
+  get_all_phi_affine_ivs (loop, &simple_ivs);
+
+  return canonicalize_loop_ivs_1 (loop, nit, bump_in_latch, &simple_ivs);
+}
Tom de Vries March 22, 2018, 3:44 p.m. UTC | #8
On 03/21/2018 04:43 PM, Richard Biener wrote:
> On Wed, 21 Mar 2018, Tom de Vries wrote:
> 
>> On 03/12/2018 01:14 PM, Richard Biener wrote:
>>> On Thu, 22 Feb 2018, Tom de Vries wrote:
>>>
>>>> Hi,
>>>>
>>>> this patch fixes an ICE in the parloops pass.
>>>>
>>>> The ICE (when compiling the test-case in attached patch) follows from the
>>>> fact
>>>> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
>>>> "base all the induction variables in LOOP on a single control one":
>>>> ...
>>>>     /* Base all the induction variables in LOOP on a single control one.*/
>>>>     canonicalize_loop_ivs (loop, &nit, true);
>>>> ...
>>>>
>>>> This is caused by the fact that for one of the induction variables,
>>>> simple_iv
>>>> no longer returns true (as was the case at the start of
>>>> gen_parallel_loop).
>>>> This seems to be triggered by the fact that during loop_version we call
>>>> scev_reset (although I'm not sure why when recalculating scev info we're
>>>> not
>>>> reaching the same conclusion as before).
>>> I guess the real reason is that canonicalize_loop_ivs calls
>>> create_iv first which in the parloop case with bump-in-latch true
>>> and then incrementally re-writes PHIs, eventually wrecking other
>>> PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
>>> first one but the rewritten ones depend on the new IV already.
>>>
>>> So the better fix (maybe not 100% enough) would be to re-organize
>>> canonicalize_loop_ivs to first do analysis of all PHIs and then
>>> rewrite them all.
>>>
>>
>> Focusing on the re-organize first, is this sort of what you had in mind?
> 
> Yes, though not sure why you need to have a hash-map here, just pushing
> to a vector of PHIs would work, no?  Or alternatively a bitmap
> of SSA versions so you have a way to match the gphi iterator with
> the set of IVs.  But the vector should be sorted in PHI order so
> just traversing the PHIs looking for the "next" PHI in the vector
> would work I guess.
> 
> Does it help with the original issue btw?
> 

Not as such.

The original problem happens before canonicalize_loop_ivs.

Perhaps the easiest way to demonstrate the problem is this patch:
...
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 3a788cc..801fc46 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2254,6 +2254,38 @@ num_phis (basic_block bb, bool count_virtual_p)
    return nr_phis;
  }

+static unsigned
+get_nr_simple_ivs (struct loop *loop)
+{
+  unsigned cnt = 0;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+      if (bb->loop_father != loop)
+	continue;
+
+      for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
+	   gsi_next (&psi))
+	{
+	  gphi *phi = psi.phi ();
+	  tree res = PHI_RESULT (phi);
+	  if (virtual_operand_p (res))
+	    continue;
+
+	  affine_iv iv;
+	  if (!simple_iv (loop, loop, res, &iv, true))
+	    continue;
+	  cnt++;
+	}
+    }
+
+  free (bbs);
+
+  return cnt;
+}
+
  /* Generates code to execute the iterations of LOOP in N_THREADS
     threads in parallel, which can be 0 if that number is to be determined
     later.
@@ -2378,11 +2410,15 @@ gen_parallel_loop (struct loop *loop,
        initialize_original_copy_tables ();

        /* We assume that the loop usually iterates a lot.  */
+      fprintf (stderr, "nr simple ivs: %u @ line %d\n",
+	       get_nr_simple_ivs (loop), __LINE__);
        loop_version (loop, many_iterations_cond, NULL,
  		    profile_probability::likely (),
  		    profile_probability::unlikely (),
  		    profile_probability::likely (),
  		    profile_probability::unlikely (), true);
+      fprintf (stderr, "nr simple ivs: %u @ line %d\n",
+	       get_nr_simple_ivs (loop), __LINE__);
        update_ssa (TODO_update_ssa);
        free_original_copy_tables ();
      }
...

and this output for pr83126.c:
...
nr simple ivs: 3 @ line 2414
nr simple ivs: 2 @ line 2421
...

I've written attached follow-up patch that works around that problem by 
analyzing the phis before calling loop_version. It works for the 
example, and bootstraps fine, but I'm not familiar enough with 
loop_version to known whether this is safe or not.

Thanks,
- Tom
[parloops] Fix canonicalize_loop_ivs in gen_parallel_loop

2018-03-21  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/83126
	* tree-parloops.c (gen_parallel_loop): Inline canonicalize_loop_ivs.
	Assert that canonicalize_loop_ivs succeeds.
	* tree-ssa-loop-manip.c (get_all_phi_affine_ivs): Remove static.
	(struct phi_affine_iv): Move ...
	* tree-ssa-loop-manip.h (struct phi_affine_iv): ... here.
	(get_all_phi_affine_ivs, canonicalize_loop_ivs_1): Declare.

---
 gcc/tree-parloops.c       | 27 ++++++---------------------
 gcc/tree-ssa-loop-manip.c |  8 +-------
 gcc/tree-ssa-loop-manip.h | 10 ++++++++++
 3 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 3a788cc..9c2d19a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2339,6 +2339,9 @@ gen_parallel_loop (struct loop *loop,
     we should compute nit = n * m, not nit = n.  
     Also may_be_zero handling would need to be adjusted.  */
 
+  auto_vec<struct phi_affine_iv> simple_ivs;
+  get_all_phi_affine_ivs (loop, &simple_ivs);
+
   type = TREE_TYPE (niter->niter);
   nit = force_gimple_operand (unshare_expr (niter->niter), &stmts, true,
 			      NULL_TREE);
@@ -2388,27 +2391,9 @@ gen_parallel_loop (struct loop *loop,
     }
 
   /* Base all the induction variables in LOOP on a single control one.  */
-  canonicalize_loop_ivs (loop, &nit, true);
-  if (num_phis (loop->header, false) != reduction_list->elements () + 1)
-    {
-      /* The call to canonicalize_loop_ivs above failed to "base all the
-	 induction variables in LOOP on a single control one".  Do damage
-	 control.  */
-      basic_block preheader = loop_preheader_edge (loop)->src;
-      basic_block cond_bb = single_pred (preheader);
-      gcond *cond = as_a <gcond *> (gsi_stmt (gsi_last_bb (cond_bb)));
-      gimple_cond_make_true (cond);
-      update_stmt (cond);
-      /* We've gotten rid of the duplicate loop created by loop_version, but
-	 we can't undo whatever canonicalize_loop_ivs has done.
-	 TODO: Fix this properly by ensuring that the call to
-	 canonicalize_loop_ivs succeeds.  */
-      if (dump_file
-	  && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "canonicalize_loop_ivs failed for loop %d,"
-		 " aborting transformation\n", loop->num);
-      return;
-    }
+  canonicalize_loop_ivs_1 (loop, &nit, true, &simple_ivs);
+  gcc_assert (num_phis (loop->header, false)
+	      == reduction_list->elements () + 1);
 
   /* Ensure that the exit condition is the first statement in the loop.
      The common case is that latch of the loop is empty (apart from the
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index ecda325..6014384 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -1453,15 +1453,9 @@ rewrite_phi_with_iv (gphi_iterator *psi,
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 }
 
-struct phi_affine_iv
-{
-  gphi *phi;
-  affine_iv iv;
-};
-
 /* Return map of phi node result to affine_iv, for all phis in LOOPS.  */
 
-static void
+void
 get_all_phi_affine_ivs (struct loop *loop,
 			auto_vec<struct phi_affine_iv> *simple_ivs)
 {
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 390ac6f..ef439be 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #ifndef GCC_TREE_SSA_LOOP_MANIP_H
 #define GCC_TREE_SSA_LOOP_MANIP_H
+#include "tree-ssa-loop.h"
 
 typedef void (*transform_callback)(struct loop *, void *);
 
@@ -56,6 +57,15 @@ extern void tree_unroll_loop (struct loop *, unsigned,
 			      edge, struct tree_niter_desc *);
 extern tree canonicalize_loop_ivs (struct loop *, tree *, bool);
 
+struct phi_affine_iv
+{
+  gphi *phi;
+  affine_iv iv;
+};
+extern void get_all_phi_affine_ivs (struct loop *,
+				    auto_vec<struct phi_affine_iv> *);
+extern tree canonicalize_loop_ivs_1 (struct loop *, tree *, bool,
+				     auto_vec<struct phi_affine_iv> *);
 
 
 #endif /* GCC_TREE_SSA_LOOP_MANIP_H */
diff mbox series

Patch

[parloops] Use cached affine_ivs canonicalize_loop_ivs

2018-02-22  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/83126
	* tree-ssa-loop-manip.c (rewrite_phi_with_iv): Add and handle piv
	parameter.
	(rewrite_all_phi_nodes_with_iv, canonicalize_loop_ivs): Add and handle
	simple_ivs parameter.
	* tree-ssa-loop-manip.h (canonicalize_loop_ivs): Declare simple_ivs
	parameter and default to NULL.
	* tree-parloops.c (num_phis): New function.
	(gen_parallel_loop): Add simple_ivs argument to canonicalize_loop_ivs
	call.

---
 gcc/testsuite/gcc.dg/graphite/pr83126.c | 19 +++++++++++++++
 gcc/tree-parloops.c                     | 43 +++++++++++++++++++++++++++++++--
 gcc/tree-ssa-loop-manip.c               | 41 ++++++++++++++++++++++++-------
 gcc/tree-ssa-loop-manip.h               |  6 ++++-
 4 files changed, 97 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/graphite/pr83126.c b/gcc/testsuite/gcc.dg/graphite/pr83126.c
new file mode 100644
index 0000000..663d059
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr83126.c
@@ -0,0 +1,19 @@ 
+/* { dg-additional-options "-w -ftree-parallelize-loops=2 -floop-parallelize-all -O1" }  */
+
+void
+ew (unsigned short int c9, int stuff)
+{
+  int e1;
+
+  for (;;)
+    {
+      unsigned int *by = &e1;
+      int *fd = &stuff;
+
+      *fd = c9;
+      fd = *fd;
+      if (*fd != 0)
+	for (*by = 0; *by < 2; ++*by)
+	  c9 *= e1;
+    }
+}
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e44ad5e..5971680 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2235,6 +2235,25 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   calculate_dominance_info (CDI_DOMINATORS);
 }
 
+/* Return number of phis in bb.  If COUNT_VIRTUAL_P is false, don't count the
+   virtual phi.  */
+
+static unsigned int
+num_phis (basic_block bb, bool count_virtual_p)
+{
+  unsigned int nr_phis = 0;
+  gphi_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      if (!count_virtual_p && virtual_operand_p (PHI_RESULT (gsi.phi ())))
+	continue;
+
+      nr_phis++;
+    }
+
+  return nr_phis;
+}
+
 /* Generates code to execute the iterations of LOOP in N_THREADS
    threads in parallel, which can be 0 if that number is to be determined
    later.
@@ -2326,6 +2345,24 @@  gen_parallel_loop (struct loop *loop,
   if (stmts)
     gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
 
+  /* The loop_version call below calls gimple_duplicate_loop_to_header_edge,
+     which calls scev_reset.  So a loop header phi node that classifies as
+     simple_iv here, may not classify as such anymore after loop_version, which
+     will make canonicalize_loop_ivs fail to "base all the induction variables
+     in LOOP on a single control one".  We fix this by storing the affine_iv
+     result of the simple_iv call in a map from phi node result to affine_iv,
+     and passing that map as an argument to canonicalize_loop_ivs.  */
+  std::map<tree,affine_iv> simple_ivs;
+  gphi_iterator psi;
+  for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi))
+    {
+      gphi *phi = psi.phi ();
+      affine_iv iv;
+      tree res = PHI_RESULT (phi);
+      if (simple_iv (loop, loop, res, &iv, true))
+	simple_ivs.insert (std::pair<tree,affine_iv> (res, iv));
+    }
+
   if (!oacc_kernels_p)
     {
       if (loop->inner)
@@ -2364,12 +2401,14 @@  gen_parallel_loop (struct loop *loop,
 		    profile_probability::unlikely (),
 		    profile_probability::likely (),
 		    profile_probability::unlikely (), true);
-      update_ssa (TODO_update_ssa);
       free_original_copy_tables ();
     }
 
   /* Base all the induction variables in LOOP on a single control one.  */
-  canonicalize_loop_ivs (loop, &nit, true);
+  canonicalize_loop_ivs (loop, &nit, true, &simple_ivs);
+  gcc_assert (num_phis (loop->header, false)
+	      == reduction_list->elements () + 1);
+  update_ssa (TODO_update_ssa);
 
   /* Ensure that the exit condition is the first statement in the loop.
      The common case is that latch of the loop is empty (apart from the
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index bf425af..1054ef3 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -43,6 +43,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-inline.h"
+#include <map>
 
 /* All bitmaps for rewriting into loop-closed SSA go on this obstack,
    so that we can free them all at once.  */
@@ -1427,13 +1428,14 @@  tree_unroll_loop (struct loop *loop, unsigned factor,
 }
 
 /* Rewrite the phi node at position PSI in function of the main
-   induction variable MAIN_IV and insert the generated code at GSI.  */
+   induction variable MAIN_IV and insert the generated code at GSI.  PIV
+   optionally contains the associcated affine_iv.  */
 
 static void
 rewrite_phi_with_iv (loop_p loop,
 		     gphi_iterator *psi,
 		     gimple_stmt_iterator *gsi,
-		     tree main_iv)
+		     tree main_iv, affine_iv *piv)
 {
   affine_iv iv;
   gassign *stmt;
@@ -1446,7 +1448,9 @@  rewrite_phi_with_iv (loop_p loop,
       return;
     }
 
-  if (!simple_iv (loop, loop, res, &iv, true))
+  if (piv)
+    iv = *piv;
+  else if (!simple_iv (loop, loop, res, &iv, true))
     {
       gsi_next (psi);
       return;
@@ -1468,10 +1472,12 @@  rewrite_phi_with_iv (loop_p loop,
 }
 
 /* Rewrite all the phi nodes of LOOP in function of the main induction
-   variable MAIN_IV.  */
+   variable MAIN_IV.  SIMPLE_IVS contains an optional map from phi node result
+   to affive_iv.  */
 
 static void
-rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
+rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv,
+			       std::map<tree,affine_iv> *simple_ivs)
 {
   unsigned i;
   basic_block *bbs = get_loop_body_in_dom_order (loop);
@@ -1486,7 +1492,22 @@  rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
 	continue;
 
       for (psi = gsi_start_phis (bb); !gsi_end_p (psi); )
-	rewrite_phi_with_iv (loop, &psi, &gsi, main_iv);
+	{
+	  affine_iv *piv = NULL;
+	  affine_iv iv;
+	  gphi *phi = psi.phi ();
+	  if (simple_ivs)
+	    {
+	      std::map<tree,affine_iv>::iterator search
+		= simple_ivs->find (PHI_RESULT (phi));
+	      if (search != simple_ivs->end ())
+		{
+		  iv = search->second;
+		  piv = &iv;
+		}
+	    }
+	  rewrite_phi_with_iv (loop, &psi, &gsi, main_iv, piv);
+	}
     }
 
   free (bbs);
@@ -1498,11 +1519,13 @@  rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
    converted to the larger type, the conversion code is inserted before the
    loop, and *NIT is updated to the new definition.  When BUMP_IN_LATCH is true,
    the induction variable is incremented in the loop latch, otherwise it is
-   incremented in the loop header.  Return the induction variable that was
+   incremented in the loop header.  SIMPLE_IVS contains an optional map from
+   phi node result to affive_iv.  Return the induction variable that was
    created.  */
 
 tree
-canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
+canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch,
+		       std::map<tree,affine_iv> *simple_ivs)
 {
   unsigned precision = TYPE_PRECISION (TREE_TYPE (*nit));
   unsigned original_precision = precision;
@@ -1557,7 +1580,7 @@  canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
   create_iv (build_int_cst_type (type, 0), build_int_cst (type, 1), NULL_TREE,
 	     loop, &gsi, bump_in_latch, &var_before, NULL);
 
-  rewrite_all_phi_nodes_with_iv (loop, var_before);
+  rewrite_all_phi_nodes_with_iv (loop, var_before, simple_ivs);
 
   stmt = as_a <gcond *> (last_stmt (exit->src));
   /* Make the loop exit if the control condition is not satisfied.  */
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 390ac6f..a96ccb7 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -20,6 +20,9 @@  along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_TREE_SSA_LOOP_MANIP_H
 #define GCC_TREE_SSA_LOOP_MANIP_H
 
+#include <map>
+#include "tree-ssa-loop.h"
+
 typedef void (*transform_callback)(struct loop *, void *);
 
 extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
@@ -54,7 +57,8 @@  extern void tree_transform_and_unroll_loop (struct loop *, unsigned,
 					    transform_callback, void *);
 extern void tree_unroll_loop (struct loop *, unsigned,
 			      edge, struct tree_niter_desc *);
-extern tree canonicalize_loop_ivs (struct loop *, tree *, bool);
+extern tree canonicalize_loop_ivs (struct loop *, tree *, bool,
+				   std::map<tree,affine_iv> *simple_ivs = NULL);