Message ID | alpine.LSU.2.20.1701121550340.14052@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
On 01/12/2017 07:55 AM, Richard Biener wrote: > > The following fixes PR77283, path splitting being overly aggressive > and causing loop unrolling not to happen (because how it distorts the > CFG). > > It is a aim at creating a cost model (there's none apart from > not duplicating too much stmts) by means of the observation that > we'd have to have PHI nodes in the joiner to have any possibility > of CSE opportunities being exposed by duplicating it or threading > opportunities being exposed across the new latch. That includes > virtual PHIs for CSE (so any load/store) but not for the threading > (but IMHO threading should figure all this out on its own without > the requirement for somebody else duplicating the joiner). > > Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x > libquantum regression is reportedly fixed by this. I had to adjust > gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because > I (and of course the cost model) can't see any reason to do it. I went back and reviewed the discussion from last year. The conclusion for linit (split-path-7.c) was that there was a path we could split, but that there was no real benefit in doing so at the tree level. The more general conclusion was that path splitting rarely exposes CSE/DCE opportunities, contrary to the original motivation (the adpcm encoder is the exception). What path splitting does more often is remove an unconditional branch in diamond shaped sub-graphs. In an ideal world, raw path splitting would move into the RTL pipeline since it's primary value is to eliminate jumps and we'd use some kind of PHI partitioning to handle cases where constant values from some paths of control allow simplification at use sites. It's just path isolation. I'll try to get to the rest of the review tomorrow tonight/tomorrow. jeff
On 01/12/2017 07:55 AM, Richard Biener wrote: > > The following fixes PR77283, path splitting being overly aggressive > and causing loop unrolling not to happen (because how it distorts the > CFG). > > It is a aim at creating a cost model (there's none apart from > not duplicating too much stmts) by means of the observation that > we'd have to have PHI nodes in the joiner to have any possibility > of CSE opportunities being exposed by duplicating it or threading > opportunities being exposed across the new latch. That includes > virtual PHIs for CSE (so any load/store) but not for the threading > (but IMHO threading should figure all this out on its own without > the requirement for somebody else duplicating the joiner). > > Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x > libquantum regression is reportedly fixed by this. I had to adjust > gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because > I (and of course the cost model) can't see any reason to do it. > > Ok for trunk? > > Thanks, > Richard. > > 2017-01-12 Richard Biener <rguenther@suse.de> > > PR tree-optimization/77283 > * gimple-ssa-split-paths.c: Include gimple-ssa.h, tree-phinodes.h > and ssa-iterators.h. > (is_feasible_trace): Implement a cost model based on joiner > PHI node uses. > > * gcc.dg/tree-ssa/split-path-7.c: Adjust. > * gcc.dg/tree-ssa/split-path-8.c: New testcase. > * gcc.dg/tree-ssa/split-path-9.c: Likewise. So I think the only concern is split-path-7. My memory is hazy, but I suspect split-path-7 shows the case where path splitting's CFG manipulations can result in fewer jumps for diamond sub-graphs. You might see assembly code improvements due to path splitting on this test for the microblaze port. Certainly the code in gimple-ssa-split-paths.c that you're adding is an improvement and brings gimple path splitting closer to its intended purpose. I don't think regressing split-path-7 should block this improvement, but we would want a PR to track the code quality regression. So I think it's OK for the trunk and if it shows a code quality regression for split-path-7 on the microblaze port that we should have a distinct PR to track that issue (which is probably best solved in bb-reorder). Thanks, Jeff
Index: gcc/gimple-ssa-split-paths.c =================================================================== --- gcc/gimple-ssa-split-paths.c (revision 244350) +++ gcc/gimple-ssa-split-paths.c (working copy) @@ -32,6 +32,9 @@ along with GCC; see the file COPYING3. #include "tracer.h" #include "predict.h" #include "params.h" +#include "gimple-ssa.h" +#include "tree-phinodes.h" +#include "ssa-iterators.h" /* Given LATCH, the latch block in a loop, see if the shape of the path reaching LATCH is suitable for being split by duplication. @@ -200,6 +203,58 @@ is_feasible_trace (basic_block bb) } } + /* If the joiner has no PHIs with useful uses there is zero chance + of CSE/DCE/jump-threading possibilities exposed by duplicating it. */ + bool found_useful_phi = false; + for (gphi_iterator si = gsi_start_phis (bb); ! gsi_end_p (si); + gsi_next (&si)) + { + gphi *phi = si.phi (); + use_operand_p use_p; + imm_use_iterator iter; + FOR_EACH_IMM_USE_FAST (use_p, iter, gimple_phi_result (phi)) + { + gimple *stmt = USE_STMT (use_p); + if (is_gimple_debug (stmt)) + continue; + /* If there's a use in the joiner this might be a CSE/DCE + opportunity. */ + if (gimple_bb (stmt) == bb) + { + found_useful_phi = true; + break; + } + /* If the use is on a loop header PHI and on one path the + value is unchanged this might expose a jump threading + opportunity. */ + if (gimple_code (stmt) == GIMPLE_PHI + && gimple_bb (stmt) == bb->loop_father->header + /* But for memory the PHI alone isn't good enough. */ + && ! virtual_operand_p (gimple_phi_result (stmt))) + { + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + if (gimple_phi_arg_def (phi, i) == gimple_phi_result (stmt)) + { + found_useful_phi = true; + break; + } + if (found_useful_phi) + break; + } + } + if (found_useful_phi) + break; + } + if (! found_useful_phi) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Block %d is a join that does not expose CSE/DCE/jump-thread " + "opportunities when duplicated.\n", + bb->index); + return false; + } + /* We may want something here which looks at dataflow and tries to guess if duplication of BB is likely to result in simplification of instructions in BB in either the original or the duplicate. */ Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c (revision 244350) +++ gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c (working copy) @@ -91,4 +91,4 @@ linit () } } } -/* { dg-final { scan-tree-dump-times "Duplicating join block" 2 "split-paths" } } */ +/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c (working copy) @@ -0,0 +1,14 @@ +/* PR77283 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-split-paths-details" } */ + +void +foo (double *x, double *a, double *b, long n, double limit) +{ + long i; + for (i=0; i < n; i++) + if (a[i] < limit) + x[i] = b[i]; +} + +/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c (working copy) @@ -0,0 +1,17 @@ +/* PR77366 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-split-paths-details" } */ + +void +foo(unsigned int size, unsigned int *state) +{ + unsigned int i; + + for(i = 0; i < size; i++) + { + if(state[i] & 1) + state[i] ^= 1; + } +} + +/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */