diff mbox

Fix PR77283

Message ID alpine.LSU.2.20.1701121550340.14052@zhemvz.fhfr.qr
State New
Headers show

Commit Message

Richard Biener Jan. 12, 2017, 2:55 p.m. UTC
The following fixes PR77283, path splitting being overly aggressive
and causing loop unrolling not to happen (because how it distorts the
CFG).

It is a aim at creating a cost model (there's none apart from
not duplicating too much stmts) by means of the observation that
we'd have to have PHI nodes in the joiner to have any possibility
of CSE opportunities being exposed by duplicating it or threading
opportunities being exposed across the new latch.  That includes
virtual PHIs for CSE (so any load/store) but not for the threading
(but IMHO threading should figure all this out on its own without
the requirement for somebody else duplicating the joiner).

Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x
libquantum regression is reportedly fixed by this.  I had to adjust
gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because
I (and of course the cost model) can't see any reason to do it.

Ok for trunk?

Thanks,
Richard.

2017-01-12  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/77283
	* gimple-ssa-split-paths.c: Include gimple-ssa.h, tree-phinodes.h
	and ssa-iterators.h.
	(is_feasible_trace): Implement a cost model based on joiner
	PHI node uses.

	* gcc.dg/tree-ssa/split-path-7.c: Adjust.
	* gcc.dg/tree-ssa/split-path-8.c: New testcase.
	* gcc.dg/tree-ssa/split-path-9.c: Likewise.

Comments

Jeff Law Jan. 12, 2017, 11:29 p.m. UTC | #1
On 01/12/2017 07:55 AM, Richard Biener wrote:
>
> The following fixes PR77283, path splitting being overly aggressive
> and causing loop unrolling not to happen (because how it distorts the
> CFG).
>
> It is a aim at creating a cost model (there's none apart from
> not duplicating too much stmts) by means of the observation that
> we'd have to have PHI nodes in the joiner to have any possibility
> of CSE opportunities being exposed by duplicating it or threading
> opportunities being exposed across the new latch.  That includes
> virtual PHIs for CSE (so any load/store) but not for the threading
> (but IMHO threading should figure all this out on its own without
> the requirement for somebody else duplicating the joiner).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x
> libquantum regression is reportedly fixed by this.  I had to adjust
> gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because
> I (and of course the cost model) can't see any reason to do it.
I went back and reviewed the discussion from last year.  The conclusion 
for linit (split-path-7.c) was that there was a path we could split, but 
that there was no real benefit in doing so at the tree level.

The more general conclusion was that path splitting rarely exposes 
CSE/DCE opportunities, contrary to the original motivation (the adpcm 
encoder is the exception).  What path splitting does more often is 
remove an unconditional branch in diamond shaped sub-graphs.

In an ideal world, raw path splitting would move into the RTL pipeline 
since it's primary value is to eliminate jumps and we'd use some kind of 
PHI partitioning to handle cases where constant values from some paths 
of control allow simplification at use sites.  It's just path isolation.

I'll try to get to the rest of the review tomorrow tonight/tomorrow.

jeff
Jeff Law Jan. 13, 2017, 3:52 a.m. UTC | #2
On 01/12/2017 07:55 AM, Richard Biener wrote:
>
> The following fixes PR77283, path splitting being overly aggressive
> and causing loop unrolling not to happen (because how it distorts the
> CFG).
>
> It is a aim at creating a cost model (there's none apart from
> not duplicating too much stmts) by means of the observation that
> we'd have to have PHI nodes in the joiner to have any possibility
> of CSE opportunities being exposed by duplicating it or threading
> opportunities being exposed across the new latch.  That includes
> virtual PHIs for CSE (so any load/store) but not for the threading
> (but IMHO threading should figure all this out on its own without
> the requirement for somebody else duplicating the joiner).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x
> libquantum regression is reportedly fixed by this.  I had to adjust
> gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because
> I (and of course the cost model) can't see any reason to do it.
>
> Ok for trunk?
>
> Thanks,
> Richard.
>
> 2017-01-12  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/77283
> 	* gimple-ssa-split-paths.c: Include gimple-ssa.h, tree-phinodes.h
> 	and ssa-iterators.h.
> 	(is_feasible_trace): Implement a cost model based on joiner
> 	PHI node uses.
>
> 	* gcc.dg/tree-ssa/split-path-7.c: Adjust.
> 	* gcc.dg/tree-ssa/split-path-8.c: New testcase.
> 	* gcc.dg/tree-ssa/split-path-9.c: Likewise.
So I think the only concern is split-path-7.  My memory is hazy, but I 
suspect split-path-7 shows the case where path splitting's CFG 
manipulations can result in fewer jumps for diamond sub-graphs.  You 
might see assembly code improvements due to path splitting on this test 
for the microblaze port.

Certainly the code in gimple-ssa-split-paths.c that you're adding is an 
improvement and brings gimple path splitting closer to its intended 
purpose.  I don't think regressing split-path-7 should block this 
improvement, but we would want a PR to track the code quality regression.

So I think it's OK for the trunk and if it shows a code quality 
regression for split-path-7 on the microblaze port that we should have a 
distinct PR to track that issue (which is probably best solved in 
bb-reorder).

Thanks,
Jeff
diff mbox

Patch

Index: gcc/gimple-ssa-split-paths.c
===================================================================
--- gcc/gimple-ssa-split-paths.c	(revision 244350)
+++ gcc/gimple-ssa-split-paths.c	(working copy)
@@ -32,6 +32,9 @@  along with GCC; see the file COPYING3.
 #include "tracer.h"
 #include "predict.h"
 #include "params.h"
+#include "gimple-ssa.h"
+#include "tree-phinodes.h"
+#include "ssa-iterators.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
    path reaching LATCH is suitable for being split by duplication.
@@ -200,6 +203,58 @@  is_feasible_trace (basic_block bb)
 	}
     }
 
+  /* If the joiner has no PHIs with useful uses there is zero chance
+     of CSE/DCE/jump-threading possibilities exposed by duplicating it.  */
+  bool found_useful_phi = false;
+  for (gphi_iterator si = gsi_start_phis (bb); ! gsi_end_p (si);
+       gsi_next (&si))
+    {
+      gphi *phi = si.phi ();
+      use_operand_p use_p;
+      imm_use_iterator iter;
+      FOR_EACH_IMM_USE_FAST (use_p, iter, gimple_phi_result (phi))
+	{
+	  gimple *stmt = USE_STMT (use_p);
+	  if (is_gimple_debug (stmt))
+	    continue;
+	  /* If there's a use in the joiner this might be a CSE/DCE
+	     opportunity.  */
+	  if (gimple_bb (stmt) == bb)
+	    {
+	      found_useful_phi = true;
+	      break;
+	    }
+	  /* If the use is on a loop header PHI and on one path the
+	     value is unchanged this might expose a jump threading
+	     opportunity.  */
+	  if (gimple_code (stmt) == GIMPLE_PHI
+	      && gimple_bb (stmt) == bb->loop_father->header
+	      /* But for memory the PHI alone isn't good enough.  */
+	      && ! virtual_operand_p (gimple_phi_result (stmt)))
+	    {
+	      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+		if (gimple_phi_arg_def (phi, i) == gimple_phi_result (stmt))
+		  {
+		    found_useful_phi = true;
+		    break;
+		  }
+	      if (found_useful_phi)
+		break;
+	    }
+	}
+      if (found_useful_phi)
+	break;
+    }
+  if (! found_useful_phi)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Block %d is a join that does not expose CSE/DCE/jump-thread "
+		 "opportunities when duplicated.\n",
+		 bb->index);
+      return false;
+    }
+
   /* We may want something here which looks at dataflow and tries
      to guess if duplication of BB is likely to result in simplification
      of instructions in BB in either the original or the duplicate.  */
Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c	(revision 244350)
+++ gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c	(working copy)
@@ -91,4 +91,4 @@  linit ()
 	}
     }
 }
-/* { dg-final { scan-tree-dump-times "Duplicating join block" 2 "split-paths" } } */
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c	(working copy)
@@ -0,0 +1,14 @@ 
+/* PR77283 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-split-paths-details" } */
+
+void
+foo (double *x, double *a, double *b, long n, double limit)
+{
+  long i;
+  for (i=0; i < n; i++)
+    if (a[i] < limit)
+      x[i] = b[i];
+}
+
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c	(working copy)
@@ -0,0 +1,17 @@ 
+/* PR77366 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-split-paths-details" } */
+
+void
+foo(unsigned int size, unsigned int *state)
+{
+  unsigned int i;
+
+  for(i = 0; i < size; i++)
+    {
+      if(state[i] & 1)
+	state[i] ^= 1;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */